BORIS ЗЕМЕОМОЕЕ 
ERIC TRIST 


Diagnostic 
Performance Tests 


A MANUAL FOR USE WITH ADULTS 


FOREWORD BY P. E. VERNON 


Professor of Educational Psychology 
in the University of London 


TAVISTOCK PUBLICATIONS LIMITED 


First published т 1958 by 
Tavistock Publications Limited 
2 Beaumont Street, London, W.1 
and printed in Great Britain by 
J. W. Arrowsmith Ltd., Bristol 
in I0pt. Times Roman type 


© Tavistock Publications Limited, 1958 


a 
) УД 1 м 
ыл, к Ж! 
{ M 
j ш! зе Едді, “sv. Res 
A 0 


10:7, 679 
А [^ 


UITA 


ست 


CONTENTS 


—— 


FOREWORD page хі 
AUTHORS’ PREFACE хш 
1. Introduction 1 
2. The Semeonoff-Vigotsky Test з 8 
3. Тһе Trist-Hargreaves Test 27 
4. Trist-Misselbrook-Kohs 41 
5. The Сай Hollow Square’ 55 
6. The Revised Passalong Test 70 
7. Norms 80 
8. Statistical Data 98 
APPENDICES 
I. The Shortened Wechsler Verbal Scale 119 
II. The Interpretation of Standard Equivalent Scores corrected 
for Age 123 
III. The Mitchell Vocabulary Test 135 
IV. Translated Versions of the Reasoning Test 148 
у. Illustrative Case Material - 150 
REFERENCES 163 
INDEX 167 
ERRATA 


р. 6, line 2. ‘neogenetic’ sh 
| ould read ‘ ic’ 
Р. 74, para (i) ‘bonus score’ should uh 


; E 
а Баз. 
score’ 2) 


P.154, line 1 ‘hlevant’ should read ‘relevant’ 
P.154, line 2 'tean' should read ‘than’ 


PLATES 


Semeonoff-Vigotsky material ^ facing page 16 
Trist-Misselbrook-Kohs: test 


in progress 17 
Carl Hollow Square: test in 

progress 32 
Trist-Hargreaves material 33 


LINE DRAWINGS 


Trist-Misselbrook-Kohs designs page 44-45 
Passalong end-positions 73 


TABLES 


Comparison of Colours in the Semeonoff-Vigotsky and 


Hanfmann-Kasanin Tests page 24 
Dimensions of Semeonoff-Vigotsky Blocks 25 
T-M-K.: Summary of the Test Material 42 


Relation of T-M-K Designs to Original Kohs Designs 43 
Carl Hollow Square: Source of bonus and penalty values 60 
Carl Hollow Square: Order of presentation of problems 68 
Carl Hollow Square: Time Scoring Schedule—Bonus or 


Penalty 68 
Carl Hollow Square: Moves Scoring Schedule—Bonus or 
Penalty 69 
. Passalong: Specifications of Blocks 7 
. Passalong: Specifications of Boxes 71 
Passalong: Time Scoring Schedule 78 
Passalong: Moves Scoring Schedule 79 
Passalong: Speed correction 79 
Percentage Distribution of the Selection Grade Scale 81 
Performance Tests: Percentile Norms 83 
Equivalent Scores: Pemberley Battery 84-85 
Pemberley Battery: Conversion of Sum of Three Equivalent 
Scores to Intelligence Grading 86 
Equivalent Scores: Army Officer Population 87-90 
Conversion of Summed Equivalent Score to Officer Intelli- 
gence Rating 91 
T-Scores: Garston Data 92-93 
Age Norms: Matrices, T-M-K, and Carl Hollow Square 94 
Pemberley Tests: Uncorrected Correlations 101 
Pemberley Tests: Correlations, Corrected for Restriction of 
Range, of Performance Tests with Criterion Tests 102 
Pemberley Tests: Intercorrelations of Performance Tests, 
corrected for Restriction of Range 102 
Correlations, АП Tests: Pemberley *Х’ Sample 103 
Garston Tests: Uncorrected Correlations 103 


Garston Tests: Correlations, corrected for Restriction of 
Range, of Performance Tests with Criterion Tests, and 
with one another 104 

Garston Tests: Uncorrected Correlations between Criterion 
Tests: Groups A and B, and Whole Sample 104 

Comparison of Correlations from Pemberley and Garston 
Populations 105 

Mean Correlations of Performance Tests with Pooled Ver- 
sions of Matrices and Reasoning: Pemberley Data 106 


30. 


TABLES 


Vernon’s Loadings for Principal Tests page 107 
Tests included in Pemberley Analyses 107 
Factor Analyses of Pemberley Test Batteries: Unrotated 
Loadings 108 
Factor Analyses of Pemberley Test Batteries; Rotated 
Loadings 110 
Garston Factor Analyses: Unrotated Loadings 111 
Pemberley Sample Analysed for Discrepancies in Intelli- 
gence Grading 114 
Discrepancies in Intelligence Gradings 114 
Discrepancies in Intelligence Gradings based on two written 
tests and one Performance Test 115 
Equivalent Scores corrected for Age 127-129 
Equivalent Scores: Conversion of Summed Equivalent Score 
to Officer Intelligence Rating 130 
Mitchell Vocabulary Test: Word Content, and Order in 
Successive Drafts 138 
Mitchell Vocabulary Test: Author’s Unrotated Loadings 141 


Factor Loadings of the Mitchell Vocabulary Test and 
Associated Tests 141 

Reasoning Test: Content of Translated Versions 148 

Comparison of Performance on Translated Versions of 
Reasoning Test 149 


FOREWORD 


Looking back over the historical growth of diagnostic or clinical psycho- 
logical testing, one is struck by the relative meagreness of the contribution 
that British psychologists have made so far. Sir Cyril Burt, who did so much 
to develop the statistical or psychometric aspects of testing in this country, 
has also always emphasized the ‘case-study approach’, and pointed out the 
value of the insights which the experienced individual tester can get into a 
child’s temperamental tendencies. And his approach has been followed both 
by psychologists working in British child guidance clinics, and in the National 
Institute of Industrial Psychology’s methods of vocational guidance. There 
are other honourable names in this field, such as those of C. J. C. Earl, 
Margaret Lowenfeld, and Professor Zangwill. Yet, up till the Second World 
War, the number of psychologists working in mental hospitals or other in- 
stitutions catering for adult patients could probably be counted on the fingers 
of one hand. By contrast, in 1956 there were some 80 fully qualified members 
of the Adult Section of the British Psychological Society’s Committee of Pro- 
fessional Psychologists (Mental Health); and recognized courses for the 
training of postgraduate psychologists for clinical work with adults are 
established, even if on nothing like the same scale as in the United States of 
America. Nevertheless one must still admit that the more psychometric 
approach of group testing (e.g. in selection for secondary education, or in the 
Defence Services), of factorial analysis, and of objective experimental studies 
of psychiatric problems, has a higher prestige and attracts a much larger 
number of devotees in this country than in America. 

Much of the credit for the growth of diagnostic testing since 1939 belongs 
to the small group of Army psychologists who co-operated with psychiatrists 
in devising methods of officer selection—the famous War Office Selection 
Boards—in work at Emergency Mental Hospitals, in rehabilitation of ex- 
prisoners, and other special assignments which necessitated more detailed 
individual methods of assessment than the large-scale techniques employed 
for classification of the bulk of Army personnel. Mr Trist was the pioneer 
in this work, and he was later joined by Dr Semeonoff and a number of 
others whose help is acknowledged in the present volume. 

The difficulties of making really worth-while contributions to diagnostic 
testing are not far to seek. As the authors clearly bring out, the psychologist 
needs a special sensitivity to his testee’s responses—partly based on careful 
collation of past observations, partly on human sympathy and intuition— 
which enables him to control and manipulate the subjective test situation (as 
distinct from the objective conditions), and to recognize signs of anxiety, 


xi 


rigidity, obsessional and other deeper emotional tendencies which affect the 
testee’s performance. Psychologists who are best at this are also often anti- 
pathetic to the colder, analytic approach of the psychometrist, which is 
nevertheless essential if accurate norms and correlational studies (which 
underlie the scientific interpretation of any test) are to be developed. Thus it 
is rare to find individuals like the authors who can successfully combine 
these two aspects of clinical psychology. The whole set-up of diagnostic work 
is often unfavourable: the Army authorities or mental hospital doctors may 
see little need for the psychologist’s expertise; while the more psycho- 
dynamically-oriented psychiatrist, though less conservative, is still less inter- 
ested in standardized application and statistical treatment of results. Again, 
scientifically respectable results usually require large numbers of cases— 
hence the attractions of group testing. The clinical psychologist, however, 
may well have to spend a day or more testing, and formulating his conclusions 
about, each case; and he is only too well aware that each one is a different 


personality and of the artificialities of merely classifying test scores. Finally, | 


in Britain, there is not even any professional journal which is willing to 
publish more than an occasional article in this field. 

Under these circumstances, Semeonoff and Trist's book represents a great 
step forward, and it is to be recommended to psychologists and psychiatrists 
generally as well as to those specifically concerned with adult diagnosis. It 
will greatly extend the range of the psychologist's equipment for exploring 
the adult intellect and disturbances of intellectual functioning. In particular, 
it helps to fill the gap between the more routine tests such as Wechsler- 
Bellevue and Raven Matrices, and the much more subjective projection 
techniques. The book will stand alongside Goldstein and Scheerer’s mono- 
graph, Rapaport's Diagnostic Psychological Testing and other American 
manuals, and will certainly not suffer by comparison. Perhaps more important 
still, it will encourage other clinical psychologists to carry out further research 
and test development, and set a pattern of painstaking observation and 
cautious interpretation for them to follow. 

P. E, VERNON 


ж.” 


! AUTHORS' PREFACE 


In the compilation of this volume and in the development of the test material 

- en which it is based the authors have played complementary roles over а 
number of years. The approach to psychological testing which the book 

_ * exemplifies is one that attempts to bring together qualitative and quantitative 
methods in a flexible yet rigorous manner. It was developed by the second 

Ф # author at the beginning of the late war, when he was serving as а research 
psychologist at Mill Hill Emergency Hospital. He then joined the War Office 

- Selection Boards (W.O.S.B.s) as Senior Psychologist. Though in the wider 
military setting new opportunities arose to link the special performance tests 
under development to a number of group tests and to secure certain normative 
data, they seemed at first to have little more than a confirmatory role іп 
special or borderline cases, New assignments, however, soon faced the 

22 W.O.S.B. organization in which extensive use of performance tests became 
© essential. It was at this point that the first author entered the picture, as he 
became the officer responsible for this work. From then onwards the de- 
— velopment of the material and the technique has been in his hands. Since the 
war he has continued working with these methods at the University of 
Edinburgh, where additional experience has been gained and fresh data 
gatheréd. At the request of the Tavistock Institute of Human Relations, 
which provided a grant in aid, he has brought together the war-time and the 
post-war material in the present manual. For the scope and content of this, 
and the views expressed, joint responsibility is accepted by both authors, 

but the text itself is the work of the first author. 

ч Since the treatment is primarily descriptive, no attempt has been made to 
provide a comprehensive survey of similar work preceding or subsequent to 
our own. The book should therefore be regarded as a manual rather than as 
а systematic treatise. A generalized rationale of the testing method, however, 

2 Î set out briefly in the opening chapter; some account of the historical back- 

J ` ‘ground of the various techniques and of their development in the operational 

` setting is included at appropriate points in the chapters immediately follow- 
ing. These, together with chapters on norms and other statistical data, con- 

^ stitute the main portion of the book. 

Allied techniques, which do not fall within the category of performance 
tests, are treated in the Appendices. They are included in the book because 
they formed part of the same broadly based project, and И was thought appro- 
priate that as much of the hitherto unpublished work as possible should be 
presented together. Appendices I to IV describe, respectively, а short- 

“ened adaptation of a well-known individually applied psychometric instru- 
xiii 


ment (Wechsler-Bellevue), а ‘diagnostic’ use of two paper-and-pencil tests— 
the equally well-known Progressive Matrices and its less familiar companion 
test, Mill Hill Vocabulary, a new written test based on a fresh approach to 
the relation of vocabulary to intelligence, and an attempt to produce an un- 
usually wide range of translated versions of an existing test. Each of these 
projects had its roots in Service requirements, but the circumstances in which 
each was developed were similar to such as may be encountered in many 
fields of civilian employment. Areas (e.g. selection for management) in which 
these and similar methods—including, of course, the performance tests— 
might be employed will no doubt suggest themselves to readers. Their applica- 
bility to clinical purposes will, it is hoped, be self-evident. It should be noted, 
however, that the tests are intended for use mainly with adult subjects, and if 
they are applied to children, the diagnostic inferences must not be assumed 
to hold. On the psychometric side, however, it may be noted that tentative 
age norms are quoted for Trist-Misselbrook-Kohs (T-M-K); that the author 
of the Carl Hollow Square intended the test to be applicable down to the age 
of eight; and that there is no reason why our method of scoring Passalong 
should not be used for children as well as for adults. As regards the concept- 
formation tests, the position is rather different. Such experience as we have 
had in administering Trist-Hargreaves to children suggests that discrimina- 
tion is possible, but that the nature of the test takes on a different complexion 
below the adult level, Semeonoff-Vigotsky must be regarded as a purely 
adult test. 

А final Appendix contains some detailed case material, which would have 
been out of place in the main text. 


Materials for the performance tests may be had, together or separately, 
from the publishers or from the National Foundation for Educational 
Research in England and Wales, 79 Wimpole Street, London W.1. Short 
Instructions are included with each test; and scoring blanks, made up in pads 
of fifty, are available where relevant. It is recognized that privately made sets 
of Semeonoff-Vigotsky and Trist-Hargreaves exist. Copyright subsists in 
these tests, and in the ‘boards’ for Trist-Misselbrook-Kohs, but no objection 
is entertained to people continuing to use existing materials, provided that 
they conform sufficiently closely to the specifications (which will be found in 
this book). The Mitchell Vocabulary Test may also be had from the National 
Foundation for Educational Research. 

Reference is made elsewhere to our view that much work still remains to be 
done on all the tests. It is hoped that when a reasonable volume of research 
data is available it will be possible to collate the results, and perhaps to cast 
additional light on the problems of subject-examiner interaction. Users are 
therefore asked to keep in touch with the first author, and any data they may 
care to communicate will be gratefully received and acknowledged. 


This book is published with the permission of the War Office, under whose 
xiv 


auspices most of the work on which it is based was done, but the responsi- 
bility for any statement of fact or opinion rests with the authors. 
Acknowledgement is made to Dr W. J. Morgan of Aptitude Associates, 
Merrifield, Virginia, for permission to use the code name Pemberley; to the 
Editor of the Journal of Psychology to quote from G. P. Carl's paper; to the 
_ following for permission to make reference to data obtained with tests as 
named, or to reproduce or adapt the authors' accounts of their work: Dr 
W. P. Alexander in respect of the Passalong; Mr Alex Mitchell in respect 
of the Mitchell Vocabulary test; Dr D. Wechsler and the Psychological 
_ Corporation, New York, in respect of the sub-tests of the Wechsler-Bellevue 
Intelligence Scale. 
Finally, the authors would like to express their indebtedness to Mr H. 
Phillipson of the Tavistock Clinic for valuable liaison work and other 
assistance at all stages of the project. 


СНАРТЕК 1 


Introduction 


~~ 


The main object of this book is to provide instructions for the use of certain 
test material, some of it new and hitherto unpublished, and some representing 
a modification of tests and techniques already well established. Most of the 
work on which it is based was carried out during the 1939-45 war, partly at 
Mill Hill Emergency Hospital, and partly at War Office Selection Boards 
(W.O.S.B.s), where some of the tests were used as ‘confirmatory’ tests with 
the intention of obtaining, in cases of doubt, a more reliable intelligence 
rating. They were put to wider use at a special Assessment Board of the Inter- 
Services Research Bureau at ‘Pemberley’! where the ‘performance test situa- 
tion’ was treated also as an opportunity for observation of a more qualitative 
kind, and ultimately as a means of arriving at a disposal rating taking both 
cognitive and personality factors into account. 

Considerations of security prevented the dossiers and other detailed 
documents from being made available. Consequently it is not now possible 
to present data relating to the validity of the performance test situation as a 
basis for assessment іп a selection procedure, nor to adduce evidence for the 
interpretative significance of patterns of behaviour, idiosyncrasies of approach 
etc., observed in the testing programme at Pemberley. Tentative hypotheses 
are advanced in connexion with the discussion of each test, and it is hoped to 
examine these in the light of future experience. In particular it is hoped that 
users of the tests will be able to recognize and formulate the ‘sensitivities’ in 
the material. Most of the techniques will yield different sorts of information 
in different contexts. To attempt to define too closely the precise function of 
any given test is certainly premature, and probably an illusory aim at the 
best. 

The present publication is, therefore, primarily concerned to call attention 
to the quantifiable aspects of certain new or modified techniques, i.e., the 


1 А name used by W. J. Morgan (30), whose book describes other aspects of the same 
work. 


1 


DIAGNOSTIC PERFORMANCE TESTS 


test battery for which it is designed to act as a manual. Provisional norms are 
given, and the results of some factorial studies quoted. y 

The psychological test situation represents an interaction of personalities, 
in which the examiner's individual contribution cannot be entirely accounted 
for, even by the most careful standardization of instructions. The intervention 
of the examiner's personality has always been assumed to have less effect on 
cognitive than on projective testing, but even in the former case it is doubtful 
whether one can always be confident that it has not been operating. *Diag- 
nostic'testing (using the term in Rapaport's sense), with which we are primarily 
concerned, not only constitutes a social situation of the type described but 
actually depends in many cases for its effectiveness on active participation by 
the examiner at the time of testing no less than at the interpretation stage. It 
is for this reason that *universal" norms for the tests here described (and for 
others) are of somewhat uncertain value. Even in comparing performances 
by different subjects tested by the same examiner it is necessary to bear in. 
mind the possible differences in impact of the examiner's personality. There 
is something to be said, therefore, for developing one's own norms appropri- 
ate to the characteristics of the population with which one is dealing, and to 
one's approach as examiner. 

Nevertheless, and perhaps also in consequence of the complexities inherent 
in the administration of these tests, it has been considered desirable to present 
very full instructions, in order to minimize the risk of chance variation of an 
unnecessary kind, as distinct from the deliberate and purposeful modification 
of procedure, which, as will be seen, forms an essential part of the technique. 

Modification to suit circumstances is considered to be particularly neces- 
sary in relation to the manner in which the test material is presented. to the 
subject, and to the form of words used to explain the requirements of the 
task he is being required to perform. Full comprehension on the part of the 
subject of what he has to do is of course essential, and the examiner must be 
prepared to spend as much (or, in some cases, as little) time on this stage of 
the administration as individual circumstances demand. The techniques 
described are not, in general, designed for routine use, and are more suitable 
for application under conditions where there is ample time for detailed in- 
vestigation. On occasion the subject's interest or other aspects of his personal 
reaction to the test situation will make rigid enforcement of ostensibly 
standardized conditions likely to lead to the loss of valuable information. 
When such circumstances arise it is up to the examiner to decide whether to 
attempt to maintain sufficient control of the situation to make a quantitative 
assessment of performance meaningful within the framework of the available 
norms, or to allow it to develop along more informal lines. Further reference 
will be made to this point later; meanwhile the reader is asked to note that 


2 


INTRODUCTION 


the traditionally recommended uniformity of presentation is to be regarded 
as not only unnecessary, but spurious, since standardization of instructions 
must surely be regarded as aiming primarily at ensuring full comprehension, 
which is itself merely a step towards full participation by the subject. 

Special problems are encountered when administering tests to children, 
but since the tests in the present text are intended primarily for adults these 
problems do not arise, although comparable difficulties (more fully dealt 
with in the appropriate places) sometimes arise with adult subjects of low 
intelligence, or who are seriously disturbed. Where a test (e.g. the Carl 
Hollow Square, see Chapter 5) purports to be appropriate for subjects of all 
ages from about seven years upwards, the absurdity of attempting to make 
use of identical instructions on all occasions becomes manifest. A cognate 
point may arise when, as in the case of the Passalong (see Chapter 6) or the 
Trist-Misselbrook-Kohs test (see Chapter 4), the range of difficulty is steeply 
graded, and an unfavourable attitude may be adopted by a subject who forms 
the impression that he is being asked to do something unworthy of his 
attention. In such a case it may be regarded as legitimate to prepare the sub- 
ject for the apparent simplicity beforehand, and to assure him that tasks 
appropriate to his capacity will follow. On the other hand, it is embarrassing 
to subject and examiner alike if it turns out that the subject is unable, after 
all, to cope with even the initial problems which have just been described as 
childishly simple. Sensitivity to the implications for the subject of success or 
failure in such a situation, and, in general, the ability to cope with all eventu- 
alities is a skill which comes with experience rather than by precept; example, 
of course, also helps, but it is hoped that the discussion accompanying the 
instructions contained in this handbook will help to demonstrate that the 
skill referred to is not entirely incommunicable by the printed word alone. 

Before passing on to a detailed account of the various tests here presented, 
it may be of interest to consider the background against which they were 
developed. The main standardization was carried out in a military or рага- 
military setting, but this work in its turn developed more or less directly from 
that of Trist at the Mill Hill Emergency Hospital, where the situation had to 
be faced that no standardized cognitive tests were then available for clinical 
use with adults. To meet this purpose it was desired to build up a battery of 
tests, both verbal and performance, each of which would be standardized on 
its own merits, and which could eventually be related to one another. It was 
also intended that the tests should be capable of bringing out qualities of 
performance, or in other words, that they should throw light on the behaviour 
related to disturbances of cognitive function accompanying neurosis, brain 
lesion, etc. 

The work of the Mill Hill Hospital afforded an opportunity for studying 


3 


DIAGNOSTIC PERFORMANCE TESTS 


the performance of disturbed subjects at just the time when the expansion of 
psychological work in the Services gave opportunities for general standardiza- 
tion. A related project at Mill Hill was the development of a modification of 
the Wechsler-Bellevue (see Appendix 1), at that time a very new instrument, 
which was eventually widely used in the Army and the Royal Navy as a 
supplement to group tests, especially in order to allow a high level of dis- 
crimination at the top and bottom of the scale of intelligence. The Trist- 
Misselbrook modification of Kohs Blocks (T-M-K) came into use for the 
same purpose. 

Different parts of the work at Mill Hill led to interest in specific aspects of 
cognitive functioning, much of which reappears in various features of the 
tests described in this handbook. Thus the Vigotsky test (see Chapter 2) was 
first used at Mill Hill in an attempt to probe creative exploration of unfamiliar 
material. The question of how far people are able to learn is related to certain 
aspects of T-M-K and of the Carl Hollow Square. Similar possibilities in 
relation to verbal material are exploited in Wechsler and elsewhere. It was 
recognized from the outset that practically all aspects of performance could 
be affected by projection of emotion. Careful observation of performance 
could be used as a means of exploring such disturbances of behaviour, and 
still further information, particularly in relation to discrepancies between 
capacity level and effectiveness level, could be obtained by the application 
of prompting techniques, such as had been employed by Goldstein (11) with 
organic patients. А better-known precedent exists also in Rorschach limits 
testing (19, pp. 12-15) to which the procedure is analogous in intention. The 
aim may be interpreted as an attempt to discover, without interference with 
primary performance, the extent to which the subject is able to integrate 
information obtained from sources other than the material itself (i.e. primarily 
from the examiner) with his own spontaneous response to the requirements 
of the test situation. Systematization of prompting techniques remains an as 
yet unfulfilled project, though their experimental manipulation was carried 
a considerable distance by Trist in his battery for examining the effects of 
cerebral lesion. Expérience has suggested that in different circumstances the 
optimum balance between prompting and non-intervention varies. Repeated 
reference to the necessity of recognizing that each test situation presents its 
peculiar features will be found to occur throughout this book. 

A further aim in the work at Mill Hill was to link the new techniques to 
commonly used existing tests, preferably with norms derived from the same 
populations. From the point of view of practical psychometrics, it seemed 
sound policy to view the group approach as basic, in the sense that it could 
be easily applied to large numbers, with the individual approach reserved for 
the study of special cases, or as a means of filling in the detail lacking at a 


4 


INTRODUCTION 


first impression. An opportunity to carry out this twofold approach presented 
itself at Pemberley, where generous time-budgeting allowed work to be done 
in a manner unusually leisurely yet thorough-going. A somewhat similar 
situation later presented itself at Garston (No. 14 W.O.S.B.), where an ex- 
tended investigation into the reliability of W.O.S.B. techniques was carried 
out in 1945. Both projects, and certain others carried out at special Boards, 
were of course based on the general standardization data assembled and 
analysed at the Research and Training Centre (R.T.C.) of the War Office 
Selection Boards. It should be stressed, once again, that the techniques 
described in this book were first fully developed in an operational setting, the 
nature of which determined the choice of tests and the constitution of the 
populations on which standardization was carried out. 

At Pemberley, the main requirement was alternative approaches to the 
assessment of intelligence, supplementary to those in ordinary use in the 
Services. For administrative reasons it was considered desirable to retain the 
central feature of the Services test battery—Raven's Progressive Matrices, 
1938 series (35).1 An additional and contrasted test was already available in 
the ‘Reasoning’ test (S.P. Test, 45), but since the native language of the 
majority of the candidates was other than English, it was necessary to develop 
versions in a number of languages. A brief account of these is given in 
Appendix IV. 

Many other considerations pointed to the view that the two written tests 
by themselves could not be regarded аз sufficient. Doubts regarding cultural 
influences were not confined to the ‘Reasoning’ test. In the event it seemed that 
these doubts were largely unfounded, but it was also felt that difficulties such 
as the limited availability of suitable written tests and absence of facilities for 
training personnel to administer them fluently in all relevant languages made 
it imperative to provide opportunities for personal contact in a testing situa- 
tion of the type involved in the administration of Terman-Merrill and similar 
tests, and of course of performance tests in general. That performance tests 
had advantages in the context seemed self-evident: the linguistic factor in the 
administration could be reduced to a minimum, and scope would be given 
for ‘concrete’ intelligence to function. 

It was therefore necessary to choose suitable tests; a single test was felt to 
be insufficient, since most of the established performance tests, even those of 
proven reliability, were known on the basis of correlational and factorial 
studies to sample different cognitive functions from those covered by the 
commonly accepted tests of ‘intelligence’. This was particularly true of tests 
of the form-board or pattern construction type, and it was to meet this 

1 Later replaced, for a short period only, by the 1943 series (see p. 95). Where, in the text 
or tables, no date is mentioned, reference is to the 1938 series. 

5 


DIAGNOSTIC PERFORMANCE THSTS 


deficiency that it was decided to introduce and simultaneously to develop tests 
of conceptual thinking which might be supposed to conform to the neogenetic 
principles established by Spearman іп his earliest work on intelligence. Two 
tests were utilized: a variant of the Vigotsky test (Hanfmann and Kasanin's 
"Test of Concept Formation") and the Trist-Hargreaves test (see Chapter 3), 
an entirely new test based оп Trist's ‘Ambiguous Shapes’ (43), along with 
elements from his modification of the Weigl test. The Trist-Hargreaves test 
was introduced to serve as an ‘easier’ alternative to the Vigotsky test, 


C. J. C. Far. Later à modification of Alexander's Рамајоов Test ( 
added to the battery, principally to act as an alternative to the 
Hollow Square, which was rather time-consuming. Е 


улаит еее miim et the psychological staff were not 
geeckuded oburving parts of the programme—they were, in fact, 
encouraged to do вой would not have been possible, even if full data had 
| 6 


INTRODUCTION 


| ——— ashhupiehus: c creme teme 
_ The general procedure at W.O.S.B.s, whereby information collected from 
various sources was collated and a final assessment of suitability was reached, 
been described briefly by Morris (31) and more fully, along with a good 


@ of the ‘personality pointers’ based on projective tests compared. И 
gently happened that as a result of this conference the psychiatrist felt 

he had obtained enough fresh information to cause him to modify his opinion 
іп certain respects, and possibly to alter the rating he would report st the 


Board. 
_ Each of the chapters that follow is devoted to a single test, and is arranged 
lh minor variations) according to the following plan: 


“difficult situations, which—in comparison with better-known techniques — 


2 мне rather frequently 


СНАРТЕК 2 


Тһе Semeonoff-Vigotsky Test 


The material of the Semeonoff-Vigotsky test is essentially the same as that of 
Hanfmann and Kasanin’s Concept Formation Tests (15). It consists of 
twenty-two blocks, varying in colour, shape (of cross-section), area of cross- 
section (hereinafter referred to as ‘size’), and thickness (‘height in Hanfmann 
and Kasanin's description). 

There are five colours (red, blue, yellow, green, and white), six shapes 
(triangle, square, trapezium, hexagon, circle, and semicircle), two sizes (in 
the ratio of approximately 2 : 1), and two thicknesses (in about the same 
ratio). On the basis of the dimensions, the blocks may be said to fall into 
four categories: "large and thick’, ‘small and thick’, ‘large and thin’, and 
‘small and thin’. The discovery of this fourfold classification constitutes the 
problem set by the test. The colours and shapes may therefore be considered 
as non-relevant variables, which must be disregarded in arriving at the correct 
solution. 

In a sense, therefore, the precise selection of colours and shapes used does 
not materially affect the essential nature of the problem. On the other hand, 
interrelationships between shapes and colours will be found occasionally to 
exert a rather subtle influence on a given subject’s progress towards a solution. 
Consequently, it should be noted that the distribution of colours in our 
version is yery different from that of the Hanfmann-Kasanin test. It was 
partly for this reason, and partly in recognition of other differences in material 
and treatment, that it was decided to revert to the name ‘Vigotsky’, modified 
to ‘Semeonoff-Vigotsky’, as a means of distinguishing it, in references to 
published work and elsewhere, from the Hanfmann-Kasanin test. A detailed 
comparison of the two sets of material is given at the end of this chapter. 

We are not here concerned to give an account of the history or evolution 
of the technique. The original sources are somewhat inaccessible, but the 
essential information is given by Hanfmann and Kasanin (15, 16) and by 
Rapaport (34), who remarks that it might properly be called the ‘Ach-Saharov- 
Vigotsky-H-K test’. The basic idea goes back to Ach, whose work on the 


8 


THE SEMEONOFF-VIGOTSKY TEST 


nature of normal thought processes required the use of material which could 
be manipulated in various ways, according to the capacity or level of sophisti- 
cation of the individual. Saharov’s contribution appears to have been to 
modify the technique in a way which made it suitable for work with children, 
Up to this point the ‘test’ approach had not yet been adopted. This came 
with Vigotsky, who introduced the notion of a specific ‘correct’ solution to be 
achieved. Vigotsky’s interest lay primarily in the disturbances of thought 
processes which accompany schizophrenia, and it was in this field, too, that 
Hanfmann and Kasanin mostly worked. The present writers’ interest in the 
assessment of the performance of normal subjects represents a partial return 
to the earlier standpoint, while at the same time recognizing that variations 
in the quality of performance (using this term in the sense which does not 
necessarily imply valuation) have diagnostic significance (see, in general, 
Semeonoff and Laird, 41). A discussion of these and of anomalies in per- 
formance is best postponed until after a consideration of the detailed Instruc- 
tions which follow. These are adapted and expanded from various duplicated 
typescripts, some of which have had fairly wide circulation in that form. 


MATERIAL 
ч 

The twenty-two blocks аге, as already stated (р. 8) of two ‘sizes’ and two 
thicknesses. In cross-section the ‘large’ blocks are about two and a quarter 
times the area of the ‘small’ blocks; this is also the ratio of the two thick- 
nesses (18 mm. and 8 mm. respectively). ل‎ | 4 

Each of the four groups formed by the combination of the size and thick- 
ness criteria is identified by a number which is marked on the underside of 
each block belonging to that group. Colours and shapes are represented іп 
the various groups as follows: р 
Group 1. (large and thick): red triangle, blue square, green square, white 

trapezium, red circle. Е 
Group 2. (small and thin): red triangle, yellow triangle, green trapezium, 

white hexagon, yellow circle, blue semicircle. Y 
Group 3. (large and thin): yellow triangle, blue square, blue trapezium, 

green trapezium, green circle, white semicircle. ) 
Group 4. (small and thick): green triangle, yellow square, white hexagon, 

тей circle, yellow circle. я 

It is convenient, although not essential, to display the blocks on a ‘tea- 
cloth’. (This is an informal name for a sheet of board of grey or off-white 
material with circular central portion or ‘pool’, in which the blocks are 
Placed at the beginning of the test, and four corners, one for each group.) 


9 


DIAGNOSTIC PERFORMANCE TESTS 


EXPLANATION OF PROBLEM 


The use of a set form of words is not recommended. It should be made clear 
that what the test requires is that the blocks be divided or classified into four 
groups, in such a way that all the blocks in any one group have something in 
common that distinguishes them from those in the other groups. 

It is often difficult to make the nature of the problem clear at the outset, 
but there is no objection to stressing the fact that classification is required; 
pains should indeed be taken to get this idea across. If thought appropriate 
recourse may be had to explanation by analogy: e.g. that domestic objects 
may be classified by use or by material or by ownership, etc. Such expressions 
as ‘common quality or qualities’ should, however, be avoided, since this may 
give a premature hint that two criteria have to be combined. 

If (and only if) the subject asks whether there must be the same number of 
blocks in each group, it should be pointed out that since there are twenty-two 
blocks, equal groups cannot be made. 

The subject should be warned that various possibilities are likely to present 
themselves, so that more than one system of classification will seem to be 
possible. The examiner should explain that there is one complete, logical, and 
consistent solution which ‘most people will admit is better than any other’. 
The subject should be encouraged to try out any solution he is considering. 
Even if һе is not confident that it is the correct solution or is not entirely 
satisfied with it, he should nevertheless put it forward, as otherwise time may 
be lost and, although there is no time-limit, credit is given for a rapid solution. 
The blocks may, of course, be moved about freely, provided they are not 
turned over so as to expose the underside. 

The system of identification should then be explained, and one block— 
the green triangle of the small/thin type—turned as a sample and placed in 
isolation from the others in one corner of the ‘tea-cloth’; the number should 
be left showing throughout the test. It is pointed out that when the correct 
solution has been achieved all the blocks grouped together will bear the same 


number, in this case 4. It is made clear that this number merely identifies the 


group, and (if desired) that a nonsense syllable or other ‘name’ would have 
done just as well. Each time an incorrect solution is offered it will thus be 
possible to give the subject a ‘clue’ by turning over one block. These clues 
will let the subject know that certain blocks belong to different groups (or 
to the same group), and the nature of the relevant differentiae will become 
progressively apparent. Further attempts at a solution must take these known 
facts into account. 


10 


THE SEMEONOFF-VIGOTSKY TEST 


ADMINISTRATION 


When the examiner is satisfied that the subject has understood what is re- 
quired of him the signal to begin is given. At the same time the stop-watch 
should be started, but unobtrusively, as it is undesirable to increase stress by 
emphasizing the time element. Notes should be taken of everything the 
subject does or says: the tentative arrangements he makes, comments passed, 
etc. It has proved impracticable to provide a proforma for recording the 
subject’s performance; the examiner will no doubt evolve his own methods. 
Some system of shorthand or of symbols to represent the various blocks will 
be almost essential. 

As soon as the subject signifies that he is offering a solution the examiner 
should note the time, and (unless the correct solution has been reached) 
turn up one other block from the group in which the subject has placed the 
sample block. The subject is reminded of how this ‘clue’ is to be used. It may 
be necessary to point out that nothing is implied about the other blocks or the 
other groups, some or all of which may be correctly or wrongly placed, as the 
case may be. 

The subject then proceeds to try again, and each time a solution is offered 
another clue is given, i.e. another block is turned up. 

The order in which clues are given must be decided empirically by the 
examiner, but in general one should turn a block as different as possible from 
those already showing. Thus, for example, the first additional block turned 
should if possible be of the large/thin category. If, as often happens, the 
subject’s first attempt is based on shape, and only triangles have been grouped 
with the sample block, the block to turn is the large thin yellow triangle. If, 
on the other hand the group also includes the trapeziums (probably on the 
basis of these having been regarded as ‘incomplete’ or truncated triangles) one 
should turn the large thin blue trapezium (not the green one, because the 
sample is itself green). 

The next block turned should, if possible, be of either Group 2 or Group 3; 
the fourth should leave one block exposed from each of the four groups. If 
further clues are necessary, they should be given in such an order as to leave 
exposed as nearly equal numbers as possible of blocks of each kind. It isa 
good plan, as soon as a clue has been given, to rehearse all possibilities 
mentally, and to try to decide in advance which block ought if possible to be 
turned next; or at any rate to narrow down the range of possible next clues. j 
The order in which clues are given obviously depends on what the subject 
has already done, and it is impossible to ensure that the same amount of 
information will be afforded by any clue as compared with others in the 


11 


DIAGNOSTIC PERFORMANCE TESTS 


sequence or with the alternatives which present themselves at any given point 
in the subject’s progress towards a solution. One can, however, adopt the 
working principle of affording the subject the minimum additional informa- 
tion at each step by turning a block which differs from those already exposed 
in as many as possible of the variables, probably in the following order of 
precedence: size, thickness, shape, colour, ‘circularity’, geometrical regu- 
larity. (The two latter terms are defined in the section on ‘defensible alterna- 
tives’ p. 17, below). Conversely, when a second (or subsequent) block has to 
be turned of a category that has already provided a clue or clues, this should 
Бе as similar as possible to those already exposed. 

, In the rare event of its being impossible to provide a clue from a group іп 
which blocks have already been turned (e.g. Group 4, only, already correct, 
but other groups incorrect) two pieces of different kinds from one of the other 
groups that the subject has made must be turned. This of course counts as 
two clues for scoring purposes (see ‘Scoring’ p. 14, below). 

If a block is turned accidentally, or ostensibly so, it is probably best to 
leave it exposed, even if the subject claims not to have seen the number. 
This of course counts as a clue. If it is a block differing from one of those 
already exposed in thickness or size alone, the reliability of the final score 
must be regarded as questionable. 


PROMPTING 


Clues must be given only when a solution is offered. It has been suggested 
(e.g. by Hanfmann and Kasanin (16)) that clues should be given, if necessary, 
at stated intervals, say every five or ten minutes, if the subject has not mean- 
time spontaneously offered a solution. This seems inadvisable, since it makes 
the clues component of the score too directly a function of the time taken. It 
is also, of course, likely to affect the subject’s attitude. In particular, if it is 
intended to make use of the tentative norms provided, the administration 
procedure here described must be followed. 

It will, of course, often be found that the subject appears to be experiencing 
a blockage, or he may even profess unwillingness or inability to proceed to a 
solution. In such a case the best advice to the examiner is to exercise patience, 
and to the subject, to persevere; but if it really looks as if an impasse has 
been reached, the advantage of receiving a clue, even from a more or less 
random solution, should be pointed out. This sometimes produces an un- 
favourable reaction, but this is in itself not without interest. If the subject 
then proceeds too plainly under protest, it would probably be wise to discount 
the scoring aspect of the test, and to use it as a diagnostic situation only. The 


12 


THE SEMEONOFF-VIGOTSKY TEST 


success of the test situation, viewed in whatever light, must ultimately stand 
or fall by the tact and skill of the examiner. 

Particularly is this so in respect of the amount and nature of prompting 
and encouragement given. It will be found that a certain minimum will be 
almost inevitable at the stage of ‘explanation of solution’ (see following 
section). It must be left to the individual examiner to decide whether his 
own contribution to the test situation is likely to have influenced the subject's 
success and consequently his score. One should establish one's own conven- 
tions, always remembering that they should be flexible enough to suit the 
person tested and the purpose for which the test is being applied on the 
Occasion in question. ' 

Prompting should, nevertheless, be Кері within reasonable bounds, and 
anything that may be regarded as calling attention to specific relevant 
characteristics of the blocks or groups should be avoided. In the late stages 
of an application of the test it is probably permissible to invite the subject to 
‘isolate’ the blocks already turned and ‘ask himself’ in what respects those 
bearing the same number are alike and in what ways they differ from those 
with other numbers (see layout shown in plate facing p. 16). Such canaliza- 
tion of the subject's thought along systematic lines is often immediately 
efficacious, and it would obviously be inadvisable to prompt in this way in 
the early stages of an application from which it is desired to derive a score. 
Again, if the subject, verbalizing aloud, says, "There are only two thicknesses, 
so it can’t be that’, one is tempted to say ‘Why not ?'—and in certain сігешп- 
stances the temptation need not be resisted. Exceptionally, also, a subject 
may be invited to revert to a partially correct approach which he has tried 
out at an early stage and found unfruitful. 

If a solution is obviously correct in principle but contains one block mis- 
placed owing to what may reasonably be interpreted as an oversight, the 
subject may be allowed the chance of correcting himself, by putting to him a 
question such as, ‘Are you sure that is quite right?’ If he corrects his mistake 
at once, it is not necessary to count any extra clue. 


EXPLANATION 


When the correct solution has been reached the watch is stopped and the 
subject is asked to explain the basis of the classification as it now stands (not 
the stages by which he has arrived at it). On the degree of understanding re- 
vealed by the explanation depends the component of the score known as 
‘Grade of Solution’. 

Hanfmann and Kasanin (15, 16) give a system of scoring based on Time 


13 


DIAGNOSTIC PERFORMANCE TESTS 


and Clues alone, and it was Semeonoff’s intention in the early stages of his 
work with the test to adopt this method. It was found, however, that correct 
solutions were sometimes achieved without a realization of the principles 
involved, and it was in order to allow such performances to be assessed in a 
manner more in accord with the results of valid intelligence measures that 
the ‘grade (of solution)’ component was introduced. Subsequent reference to 
the later work of Hanfmann and Kasanin has shown that similar considera- 
tions had led them to the modified system of ‘scoring’ (more properly, per- 
haps, ‘evaluation’) there described. 

“Тһе grades of solution distinguished are given in the section on Scoring, 
below. Allocation of the proper grade can only be made following inquiry 
and, usually, prompting. It is probably true to say that only Grade I solutions 
(and not always even these) can be scored quite independently of prompting, 
of which the subject may or may not, of course, be able to take advantage. 
In cases of signal lack of success, as defined below, it may be necessary to 
ask the subject to repeat the task, quickly and without the mediation of clues. 


SCORING 


As already indicated, a score on the Semeonoff-Vigotsky test is made up of 
three components: Grade (sometimes referred to as ‘Basic score’), Time, and 
Clues. Since short time and a small number of clues obviously indicate 
effectiveness of performance, it is convenient to score Grade in the same way. 
Low scores, consequently, rate high. 

At different stages of experience with the test, various weightings of the 
three components have been proposed. A full account of these is given by 
Semeonoff and Laird (41). The method here to be described represents a 
compromise between accurate statistical weighting and convenient simplifica- 
tion. It may be remarked in passing that any validity coefficients involved are 
modified only in the second decimal place when changes in weights are 
applied. 

During the work of standardization, six grades of solution were distin- 
guished; the present method reduces these to five. For convenience it has been 
assumed that the step between successive grades represents what might be 
termed an ‘equal interval’ of difficulty. This is an assumption that would be 
difficult to justify on statistical grounds, but it was felt that in any case the 
important consideration was that the grades should be arranged in a sequence 
corresponding to an unambiguous progression from a fully adequate to an 
inadequate recognition of the principle underlying the solution. The contrac- 
tion of the range from six to five grades incidentally resolved doubt at the 


14 


THE SEMEONOFF-VIGOTSKY TEST 


only point at which it did not appear certain that the grades were correctly 
ordered. That the distribution of grade scores was closer to J-shaped than 
to the normal was not thought to be a serious objection, since the application 
of the grade score was in any case intended to penalize lack of insight rather 
than to help to discriminate in more general terms. 

The grades now recognized are: 


Grade I (Score 0). Any explanation which clearly recognizes the double 
dichotomy of size and thickness, or otherwise takes into account the fact that 
two criteria have to be combined. Combinations of size and volume, thick- 
ness and volume, size and weight, or thickness and weight may be accepted. 


Grade IT (Score 10). An imperfect realization of the dichotomy, e.g. ‘Each 
group consists of pieces which have the same volume, with height апа size 
as additional helps in the sorting.” Or a Grade III Solution (see below) 
amended, on prompting, to what would have been Grade 1 if given spontane- 
ously. References in earlier Instructions to solutions in which one of two 
criteria is described as subsidiary to the other perhaps require amplification: 
such a solution should not be regarded as Grade П if it is clear that the 
subject is really explaining how he arrived at the dichotomy. On the other: 
hand: ‘It’s really just a match of thickness, with size discriminating within 
these two main groups’ would probably be correctly regarded as Grade II, 
since it seems that the second part of the definition is the subject’s way of 
coping with what appears to him an arbitrary requirement, i.e. four groups 
instead of the ‘obvious’ two. 


Grade III (Score 20). This combines two groups previously distinguished: 

(a) More or less intuitive apprehension in terms of one element, usually 
‘size’ (or volume). Such a statement as ‘All about the same size’ should 
always be followed up with supplementary questions, which will usually lead 
to asking the subject what distinguishes the large/thin group from the small/ 
thick (since both have the same volume). An adequate reply will, of course, 
raise the solution to Grade II. Such a statement as “These are bigger than 
these, and these are bigger than these’ even if further supplemented by ‘And 
these, again, are thicker than these’ will not qualify, since it depends on inter- 
relationships between groups, which is not properly a mode or category of 
classification. Similarly with solutions in which members of one group are 
regarded as ‘halves’ of those in another; this also savours of ‘concrete refer- 
ence’, a term used in previous Instructions to cover such expressions as ‘they 
occupy the same space’, ‘they would pack closely together’, which are also 
Grade ІП. 


15 


DIAGNOSTIC PERFORMANCE TESTS 


(b) However adequate an explanation may be in other respects it must be 
rated as Grade III if the subject persists in adding that he tried to work in 
something else as well, e.g. each colour and each shape represented in each 
group. Quite apart from the fact that it is impossible to achieve this end, 
this type of report indicates that the subject has failed fully to realize that 
distribution of the blocks into the four groups is not relevant to classification. 


Grade IV. (Score 30). If no explanation, or only a very vague one (e.g. ‘They 
just look as if they belonged together’) is forthcoming, even with prompting 
and encouragement, the examiner should replace all the blocks in the ‘pool’, 
numbered side downward, and ask the subject to regroup them quickly into 
their four kinds. If the subject is successful, a Grade IV solution may be 
credited. If not, the performance is rated 


Grade V (Score 40). This is hardly to be regarded as a ‘solution’, in the strict 
sense, but since the task has, ostensibly at least, been accomplished it seems 
logical to allocate a score to the performance. 


To sum up, it may be said that Grade I represents a perfect solution, and 
Grade II a near-perfect one; from either of these it should be possible to re- 
construct the grouping without hesitation. Grade III represents an imperfectly 
conceptualized solution, from which it would not always be entirely clear 
which blocks went into each group; Grade IV a perceptual solution; Grade 
V an apparently accidental success. 

The final score is obtained by adding together the Grade score, the time 
taken, correct to the nearest minute, and the number of clues multiplied by 
five. This is equivalent to the formula: 

Score = Time -- 5 (Clues) + 10 (Grade - 1). 

The number of clues received is of course the number of blocks which have 

been exposed, ло! counting the sample block used as starting-point. 


DURATION OF TEST 


Тһе time taken for the correct solution to be reached varies very much indeed, 
and if it is necessary to budget time in a research programme or otherwise, 
at least forty minutes should be allowed. Even this may not be enough, and 
many subjects are able to work without discouragement for considerably 
longer. On the other hand, beyond this limit the subject's (or the examiner's) 
patience tends to wear thin, and it may be taken as a working rule to regard 
a performance which lasts over an hour as a failure, and to end the session 
with the tact necessary in the interests of therapeutic closure. For these 


16 


SEMEONOFF-VIGOTSKY TEST: Blocks laid out on the ‘tea-cloth’ (see p. 9). 
‘Cue’ blocks are shown ‘isolated’, i.e., the subject has placed apart from 
the others those blocks which have been identified for him, following 
incorrect solutions, with a view to studying their appearance and attempt- 
ing to infer which are the relevant differentiae. 


Note that in the set used for the photograph the identifying numbers 
appear on small white squares, but in the sets supplied in connection 
with this book, the numbers are punched on, 


Facing p. 16 


"Sjurod $ jo pan р игә 
uonisod ш 3je[ 532019 :рэчзгавии pauopueqe usaq seu с шә|4014 “(OS `4 


92s) рирц 12] 124 UT 542014 jo JoquINU e P309, seq eus IVY} NON “E 
waqoig ‘q ріғон 810$ ш рэ8е8ио 129415 :SHON-NOOUTTASSIN-LSTAL 


THE SEMEONOFF-VIGOTSKY TEST 


_ reasons the Semeonoff-Vigotsky is not to be considered as a desirable feature 

< tof routine clinical procedure, nor should it be included іл a testing programme 

_ in which an appreciable proportion of the subjects may be expected to fall 
below, say, the 70th percentile in general intelligence. 

" If plenty of time is available, a good deal of interesting information may 
| often be obtained from discussion with the subject after the close of the 
$ administration proper, particularly if his performance has included the pro- 
$ . duction of some of the ‘defensible alternatives’ which form the subject of 

- the next section. 


^ 


DEFENSIBLE ALTERNATIVES 


ҚОСА feature of the Semeonoff-Vigotsky test, partly deriving from the early 
< < stages of its development, is that there are certain alternative solutions, which 
Жұ “must Бе regarded as wholly or partially defensible on logical grounds, When 
one of these is produced in the course of a subject’s performance, the exam- 
| iner is faced with the problem of how to deal with it. Subjects whose person- 
___ ality pattern contains paranoid or oppositional trends may react unfavourably 
if told that a solution that appears ‘good’ to them is nevertheless unacceptable. 
. Ifsucha subject is inclined to dispute the examiner’s ruling, loss of rapport is 
` likely to result, and time may be lost in argument. Other subjects again, are 
content to ‘abide by the rules’, and accept, without undue dismay, the fact 
that their solution is not the prescribed one. As in other difficult situations in 
| administration of the test, it must be left to the examiner to decide whether 
the subject's attitude has been sufficiently disrupted to make a score calcu- 
lated in the usual way unreliable. 
. Rapaport (34, p. 468) lists а number of *usual and acceptable attempts at 
~~ solution’ which he would appear to regard as not indicative of impairment of 
conceptual thinking. The present writer would be inclined to question this 
Mes: jumption in respect of one or two of the solutions noted, and the list in any 
case clearly includes many that could hardly be defended on logical grounds. 
_ The following are the principal alternative solutions that may be regarded 
ay defensible’: 
„ vue) 3-sided/4-sided/6-sided figures/curvilinear figures (i.e, circles and 
к ^^ -semi-circles). 
(ii) Triangles plus trapeziums (interpreted as ‘incomplete’ triangles, on 
the analogy of circles and semicircles)/squares/hexagons/curvilinear 
р ^^. figures. 
ai (iii) Thickness or size combined with geometrical regularity (i.e. contrast- 
) ing the triangles, squares, circles, and hexagons with the trapeziums 


17 


DIAGNOSTIC PERFORMANCE TESTS 


(iv) Colour/white, combined with size, thickness, or geometrical regu- 

larity. 

(v) ‘Primary’ colours/‘non-primary’ (i.e. in the artist's sense, contrasting 
red, blue, and yellow with green and white), combined with one of the 
other variables. This has never actually been offered, in the writer’s 
experience, although the use of the concept of primary colour has been 
observed in tentative groupings. 

The above are noted in roughly descending order of frequency. Some lead 
to such unequal groupings (e.g. colour/white with thickness gives a ‘group’ 
consisting of one piece) that the subject rejects them out of hand as ‘im- 
probable’. 

That other types of ‘defensible’ solutions will sometimes turn up is not 
impossible. Thus Fosberg (10), describing a form of application of the test 
that allowed the subject to produce as many solutions as satisfied him as 
being ‘possible’, quotes eighteen as the maximum number of solutions 
accepted by himself as ‘correct’. No indication, however, is given as to the 
criterion of ‘correctness’. 


INADEQUATE SOLUTIONS 


Most solutions other than the ‘defensible’ may be regarded as having some 
sort of interpretative significance. 

Rapaport classifies most of these under his broad heading of “Unusual 
attempts at solution’. Certain features of Rapaport’s classification, however, 
appear to be rather arbitrary, and the following discussion presents a shorter 
list of broader categories. In it an attempt is simultaneously made to call 
attention to the varieties of disturbance of conceptual thinking, or of per- 
sonality maladjustment, with which each is associated. 


1. A partially defensible solution, of a type not mentioned by Rapaport, con- 
sists of a failure to extend a single principle of classification to cover all 
groups. Thus, all the thick blocks may constitute one group, with the thin 
blocks divided into three groups on some unobjectionable basis, e.g. circles/ 
semicircles/rectilinear figures. The flaw here is of course that the ‘thick’ 
group could perhaps equally well have been divided into similar sub-groups 
(although in the quoted example not in quite the same way). Rather similar, 
but on a lower level, is a form solution of the type triangles/circles/squares/ 
‘all the rest’. Either of these types of solution will be recognized by an 

1 Tt may be noted in passing that the last of these is impossible if the colourings of the 
Hanfmann-Kasanin set are used, since in it all the white blocks are ‘regular’. 

” 18 


THE SEMEONOFF-VIGOTSKY TEST 


intelligent and co-operative subject as imperfect; whether it can be interpreted 
as indicative of a tendency ‘to take things easy’ will of course depend on other 
aspects of the test performance. Where one of these groupings, particularly 
of the second type, genuinely appears to the subject to be an adequate 
attempt at a solution, conforming to the requirements, it must be regarded as 
indicating a low level of conceptual thinking, irrespective of the final score. 


2. An approach that clearly shows misunderstanding of the problem and 
that is probably the most typical of those used by subjects who entirely fail 
eventually to solve the problem, consists of persistent attempts at distribution, 
ie. allocating the blocks to four groups without regard to the ‘common 
оташу" that unites all the blocks in any one group. In a case of this sort the 
subject has usually overlooked or persists in ignoring the fact that the groups 
cannot contain equal numbers of blocks. It is often accompanied by a rudi- 
mentary appreciation of the size element, and the subject is worried by the 
fact that certain shapes are not provided in certain sizes. This of course 
essentially represents a ‘failure’ (or at least partial ‘rejection’ in the Rorschach 
sense). The interpretation will depend on other information, i.e. it should ` 
not be regarded as necessarily indicative of low intelligence level alone. 


3. A more pathological type of response is one based on an extremely con- 
crete approach to the material, often shown in the form of a misapprehension 
to the effect that one must ‘make’ something with the blocks. It was to 
minimize the suggestion conveyed by the ‘names’ (lag, bik, mur, сеу) of the 
blocks that Semeonoff replaced these by the numbers now used. (A similar 
change has been made by Rapaport and other workers.) Thus, а French- 
speaking subject tested by the writer insisted on constructing a wall with the 
blocks he associated with the sample ‘mur’, and other objects from the other 
blocks. This, as Rapaport points out, is a strong indication of schizoid 
tendency; it may of course appear with numbered blocks or even in the ‘free’ 
approach used by Fosberg. 


4. The approach just described often shades into solutions based on sym- 
bolism or synaesthesias based on the shapes, or more frequently the colours, 
of the blocks. This clearly indicates a disorganization of the conceptual 
approach, but if it is to be interpreted analogously to Rorschach symbolic 
content it probably indicates emotional disturbance curbed by outer control. 


5. There is also a hint of symbolism in a primitive type of classification 
associated with low intelligence level which makes use of some form of loosely 
conceived association between the ‘simpler’ geometrical figures іп triangles, 


19 


DIAGNOSTIC PERFORMANCE TESTS 


circles, and (less frequently) squares; these are often described as ‘primary’ 
shapes. Similarly, hexagons may be grouped with circles because ‘they are 
“really” circles’. Or hexagons with triangles—not, in our experience, so much 

because, as Rapaport suggests, hexagons can be ‘built up’ of triangles, but 
because of an ill-defined association of the numbers 6 and 3. Classifications 
of this type frequently suggest that the subject is a person who attempts to 
function on an intellectual level beyond his capacity, i.e. one who has an 
over-high level of aspiration. 


'6. Some subjects spend much time meticulously matching each block with 
the others for thickness, or attempting to compare or estimate areas of cross- 
section, or occasionally volumes of blocks of the two ‘intermediate’ groups 
(i.e. large/thin and small/thick). Such behaviour is a clear indication of com- 
pulsive thinking, and it is usually enough to note its occurrence and dis- 
courage the subject from further minute comparison, particularly of thick- 
ness. Thus, it is permissible to use such an expression as “Үев, those are the 
‘same’. The ‘tea-cloth’ originally used by the writer was drawn out on lightly- 
ruled graph-paper, and this was.sometimes used by subjects in an attempt to 
calculate size, etc. While recourse to such aids is interesting it is probably 
better not to allow the opportunity to occur. f 

It may be remarked that since the dimensions of blocks in the present set 
were carefully calculated, the examiner may if necessary confidently insist 
that those in each group are indeed equal in size. This does not appear to be 
true in the case of the Hanfmann and Kasanin material (see p. 25, below). 
Differences in ‘phenomenal’ size mislead some subjects; thus the ‘large’ 
triangles look bigger than any other pieces (presumably because of the length 
of the side), and the ‘small’ triangles are often grouped with the ‘large’ 
squares, on the basis of length of side. Interaction of ‘size’ and thickness also 
causes some blocks to look larger or smaller than others from which they 
differ in thickness only. АП these factors must be borne in mind when 
assessing an individual performance, which may be ‘better’ than its score 
suggests. 


7. А step further still in the direction of the obsessive-compulsive syndrome 
is indicated by minute examination of the surface of the blocks for differences 
in texture or irregularity of shape. If, as in the standard set, the blocks have 
a sufficiently smooth finish there is little justification for the former, but some 
privately made sets certainly suggest such possibilities—which will of course 
occur to a highly compulsive subject irrespective of the finish. The temptation 
to label such a subject as ‘suspicious’ is probably often justified; whether the 
category of paranoid may be attached will depend on confirmatory indications 


20 


THE SEMEONOFF-VIGOTSKY TEST 


in general attitude and behaviour. As in the circumstances just discussed, this 
type of close examination should be discouraged as irrevelant. 


OTHER INTERPRETATIVE INDICATIONS 


In addition to the above modes of searching for a solution there are certain 
other behaviour patterns, attitudes, etc., which may accompany any of these 
approaches. 

An easily observed contrast is that between excessive and often apparently 
aimless manipulation of the blocks, and passive inspection of the layout 
without any attempt to reinforce speculation or deductive thinking with 
direct comparison or experimentation on a perceptual level. It is а fairly 
safe inference to equate these contrasted attitudes with over-reliance on 
‘doing’ and ‘thinking’ respectively. Cross-cultural validation would probably 
yield useful information here. Reference is made in Appendix У (‘Case Е’) to 
the test performance of a highly educated French-speaking aristocrat who 
overstepped the time normally allowed for the test, courteously but firmly 
rebutting offers of ‘forced’ clues with the remark ‘C’est pas logique’. An 
educational background that stressed the empirical maxim solvitur ambulando 
would no doubt do much to foster the contrary attitude. 

Outstanding cases of intransigent adherence to logicality do more than 
anything else to undermine the examiner's patience, but if the test is being 
used as a means of exploring the subject's modes of thought in a problem 
situation the occurrence of this pattern is of great interest. Occasionally it 
may reach the extreme stage of preventing the subject from continuing any 
further; it must be left to the examiner's ingenuity to devise a suitable in- 
centive. The relevance of this type of behaviour to rigidity is obvious. 

Completely random manipulation. seldom occurs; when it does it usually 
indicates insecurity or anxiety, and is often accompanied by ‘accidental’ 
turning of the blocks. Caution, however, must be exercised in adducing 
anxiety; the question of whether or not it should be regarded as specific to 
the test situation itself should be carefully examined. 

Other indications of anxiety include expressions of doubt of one’s own 
capacity, needless requests for confirmation of the instructions, and over- 
emotional reaction generally to the problem at large or to lack of progress. 
Similar patterns may occur in relation to other tests described in this book, 
and if several tests are being administered the examiner should look for con- 
firmation, or, if it is absent, some clue as to why any particular test rather 
than the others should arouse anxiety. 

Protestation (self-excuse in advance) is a phenomenon which will be 


"S М 1 ч А 
5 EE, 5 > 7 Library oN 
197% В و‎ (о му RN 
ie NE 4 АТАУ N fi e» m | 


DIAGNOSTIC PERFORMANCE TESTS 


sufficiently familiar to anyone who has tested adult subjects with test- 
material of an unfamiliar type. Occurring in excess such self-depreciation may 
connote a depressive trend, but it is more likely simply to indicate strong 
ego-defence, perhaps with an oppositional element. The Vigotsky test has the 
additional feature that it lends itself exceptionally readily to protestation 
after the event: a large proportion of subjects can be relied upon to say (per- 
haps in all sincerity) that they were ‘looking for something much more com- 
plicated’. To emerge with good grace from a Vigotsky performance in which 
one feels one has not done oneself justice is a sure sign either of exceptional 
stability or of extreme psychopathic acquiescence. 
A final (and frequently occurring) pattern which Rapaport diagnoses as 
* indicating insecurity is the total abandoning of a near-correct solution which 
has been shown to be incorrect, very often with reversion to an approach on 
a more primitive conceptual plane. The commonest example is a solution in 
which the significance of the thickness clement has been recognized and used 
in some way other than the correct way, which is then scrapped in favour of 
an imperfect solution based usually on shape. In so far as such behaviour 
may be regarded as regressive, it is probably correct to see in it elements of 
insecurity, but it seems to be more closely related to the behaviour of those 
people who are unable or unwilling to examine all the possibilities of a 
situation before passing on to something else. Such categories of everyday 
speech as ‘impatience’, ‘superficiality’, and the like can probably be validly 
applied in cases of this sort. 
Much of the information indicated in the above discussion will have been 
elicited in conversation with the subject after the application of the test, on 
* the analogy of the Rorschach ‘inquiry’. Some subjects will of course, make 
spontaneous comments on what they are thinking and doing, and they should 
not be discouraged from doing so. The examiner must not, however, upset 
the time element in the score by questioning the subject during the adminis- 
tration of the test if it is clear that the subject is making adequate progress 
without his intervention. 


POINTS FOR FURTHER INVESTIGATION 


Most writers on the Vigotsky test, principally reviewers ‘of the Hanfmann- 
Kasanin test and literature, have expressed scepticism regarding its validity 
as a ‘test’, in the narrow sense, and its usefulness in a clinical battery or for 
routine use. We are willing to concede.this latter point, and even to add а 
rider to the effect that it is peculiarly unsuitable in any circumstances in 
which there is likelihood of collusion between candidates at, say, a selection 


22 


sd 


THE SEMEONOFF-VIGOTSKY TEST 


board. The principle, if the subject is able to grasp it, and that applies even 
to some of the few who are unable to solve the problem, is readily com- 
municable to others. Similar considerations make it almost impossible to 
demonstrate reliability by any of the standard methods, even by re-test with. 
a parallel form, since the principle of the double dichotomy is easily remem- 
bered, or recalled even if the details of how it applied to the problem originally 
posed should happen to be forgotten. Laird (22) did devise a parallel form 
and showed that in some cases it was possible for subjects to approach the 
second test uninfluenced by the original application, so far as could be 
determined. In such cases consistency of approach usually accompanied both 
test performances, which supports the reliability (if not necessarily the 
validity) of the diagnostic indications. 


Further work with parallel forms would seem to be desirable. It would be 


particularly valuable if information could be obtained as to whether removal 
of some of the ambiguities or irregularities in the material (e.g. elimination 
of the semicircles on the ground that they are not entirely curvilinear, or 
replacement of white by a fifth chromatic colour) would help in determining 
the ‘required’ solution by reducing defensible alternatives and also make for 
greater objectivity in scoring. ' 

Reference has already been made to the question of whether effectiveness 
of verbalization should affect the score in a test that purports to be of the 
*performance' type. Our view is that if one discounts the requirement that a 
performance test should present as closely as possible a non-verbal counter- 
part of an intelligence test involving verbalization the question does not arise. 
Experience has shown that an invitation to verbalize often facilitates progress 
towards a solution, and if this is left to the discretion of the examiner, an 
silent of unreliability is introduced. The Ladd Blocks test (21), which re- 
sembles Vigotsky very closely, requires the subject to state the principle used 
after each attempt at a solution; the effect of such a practice on Vigotsky 
performance might be investigated. In the Ladd Blocks test again points are 
awarded for ‘Grade of Solution’ in much the same way asin the present version 
of Vigotsky, but making use of an additional piece, which has to be ‘fitted in’ 
to the solution, i.e. allocated to the correct group. This feature could easily 
be incorporated into the Vigotsky test, and the treatment of the additional 
block could probably also be used as a basis of discussion with the subject, 
particularly in cases where distribution or other not strictly conceptual 
thinking has played a large part in the performance. 

A still further possibility which would probably lead to a different range of 
possible attitudes and possibly a different array of norms would be to present 
the situation as a game with arbitrary rules rather than as an intellectual 
problem. In other words, while there would still be a single ‘required’ solution, 


23 


" 


DIAGNOSTIC PERFORMANCE TESTS 


no claim that this was necessarily the best solution would be made. It would 
probably be desirable to lay more stress on the time element, and the subject’s 
task would more clearly involve judgement of the best balance to strike 
between time and clues. (By way of illustration one may cite the case of a 
subject of very high intellectual and academic standing who immediately 
recognized the possibility of conserving time at the expense of clues; his 
method was to divide the blocks into four random groups by two sweeping 
movements of the hands, and to repeat this twice, barely looking at the clues 
he received at each stage. When four blocks had been exposed he was able 
to subject these to a rapid analytical examination, and arrive at the correct 
solution in record time.) The objections against using such an artificial and 
arbitrary situation as the basis of an intelligence measure are of course 
obvious, but the personality correlations of performance under these con- 
ditions would be interesting. 


SPECIFICATIONS 


As noted earlier in this chapter (p. 8), our material differs quite appreciably 
from that of the Hanfmann-Kasanin test. 


Table 1 COMPARISON OF COLOURS IN THE SEMEONOFF-VIGOTSKY 
AND HANFMANN-KASANIN TESTS 
Block Large Thick Small Thick Large Thin Small Thin 
Triangle Red Green Yellow Red 
Yellow 
Green Red White Blue 
Yellow* 
Square Blue Yellow Blue — 
Green 
Red Yellow* Green — 
Blue 
Circle Red Red Green Yellow 
Yellow 
Red* Blue Blue Yellow* 
White 
Semicircle -— — White Blue 
— — Yellow Green 
Trapezium White — Вше Green 
Green 
Yellow — Red * Red 
Green 
Hexagon — White — White 
— White* — White* 


THE SEMEONOFF-VIGOTSKY TEST 


Colour 

The colours in the two versions are compared in Table 1, Semeonoff- 
Vigotsky colours being shown in the upper lines, and Hanfmann-Kasanin 
in italic type in the lower lines. И, 

It will be noticed that only 8 colour attributions (marked *) аге common to 
both versions; to a user fully accustomed to one version the other arrange- 
ment of colours has a very unfamiliar appearance indeed apart from specific 
differences, the following points may be noted: 

(1) Semeonoff-Vigotsky has all five colours represented in only one of the 
four groups, whereas Hanfmann-Kasanin has two such groups. Consider- 
ations of this sort are likely to affect the performance of a subject who is 
trying to ‘distribute’ the blocks by colour (sce p. 19). 

(2) If one examines the pairs of *equiyalent" blocks (i.e. identical in all 
respects except colour), shown bracketed in Table 1, it will be seen that 
Semeonoff-Vigotsky has two pairs with similar colourings, whereas in 
Hanfmann-Kasanin the colour pairings are all different. Subjects have 
occasionally been known to attempt to "interpret" these colour pairings, and 
although this form of bizarre conceptualization is uncommon it suggests 
that these colour relationships (although probably fortuitous in origin) must 
be regarded as an essential feature of the material. 


Table 2 DIMENSIONS OF SEMEONOFF-VIGOTSKY BLOCKS 
‘Large blocks’ < ‘Small blocks’ 
Block Linear Area Linear Area Ratio 
dimension dimension of 
mm. sq. тт, mm. 54. тт. areas 
Triangle Ке of 43 800 28 339 2:36 
side 
Square Length of 29 841 19 361 2:33 
side 
Circle Diameter 33 855 22 380 225 
Semicircle Diameter 47 865 31 377 2:29 
Trapezium Longer 48 32 
parallel 
side 
Shorter 24 864 16 384 2:25 
parallel 
side 
Distance 24 16 
between 
sides 
Hexagon Length of 12 374 
side 


25 


DIAGNOSTIC PERFORMANCE TESTS 


Dimensions 

Examination of the Hanfmann-Kasanin material shows rather wide dis- 
crepancies both in the cross-section areas of blocks belonging to the same 
group and in the ratio of the areas of large to those of small pieces of the 
same shape. The dimensions of the Semeonoff-Vigotsky pieces, shown in 
Table 2, were calculated in such a way as to eliminate such anomalies as far 
as possible. 

It may be noted that if the sides of the triangular blocks were increased to 
44 mm. and 28:5 mm. respectively, a better equivalence of areas—838 sq. 
mm. and 364 sq. mm.—could be achieved, together with a slightly better 
ratio (2:30 : 1). In view of the greater ‘phenomenal’ size of the triangles (see 
p. 20) it is probably preferable to allow these to remain a trifle under-sized. 

The thicknesses are: ‘thick’ blocks 18 mm., and ‘thin’ blocks 8 mm. These 
give a ratio of areas of 2:25 : 1, which is as lose as possible to that between 
the ‘sizes’, The Hanfmann-Kasanin thickness ratio 2:4 : 1, while the size 
ratio appears to vary between about 2 and 3-25. 

It may also be mentioned that in the Hanfmann-Kasanin material the 
‘large’ and ‘small’ trapeziums differ slightly in shape, and that the Semeonoff- 
Vigotsky trapeziums are a little narrower (i.e. have a shorter relative dis- 
tance between the parallel sides) than either. 

Privately made sets of the material that we have seen, purporting to be 
exact copies of either of the standard sets, have sometimes diverged con- 
siderably from the specifications, and while it is unlikely that such variations 
will have had far-reaching effects, quantitative results obtained with such 
sets should be treated with caution. 


26 


СНАРТЕК 3 


The Trist-Hargreaves Test 


At Pemberley the Trist-Hargreaves test was introduced, and came to be 
regarded, as an ‘easier alternative’ to Vigotsky. Some indication of the extent 
to which it may properly be regarded as such may be had from the corre- 
lational and factorial data quoted in Chapter 8. It was assumed that transfer 
would be highly operative; few subjects, therefore, were tested with both. 

The Trist-Hargreaves test, here published for the first time, stems even 
more directly than Semeonoff-Vigotsky from Trist’s pioneer work at Mill 
Hill, The material of the present test was elaborated from a much simpler 
formî suitable for administration to subjects showing severe intellectual 
deterioration. The original work, described by Trist and Trist (44), was carried 
out with a group of treated G.P.I. patients, comparisons being made with 
the performance of normal and neurotic subjects. Five tests in all were 
used, elements from two of which were combined in the Trist-Hargreaves test 
as later developed. 

The more direct prototype of the Trist-Hargreaves was ‘Ambiguous Shapes’, 
a set of twelve flat pieces, consisting of solid squares, hollow squares, solid 
circles, and hollow circles—three identical pieces of each kind. The test was 
administered in three parts: 

Part I. Sorting into four groups (by identical shape). 

Part II. Sorting into two groups. $ 

Part III. Sorting into two groups another way. 

Adequacy of verbalization was scored as well as adequacy of sorting, and 
prompting was used. 

The other test contributing to the evolution of the Trist-Hargreaves test 
was Trist’s modification of Weigl’s Form-Colour Sorting Test (49). The 
material for this too consists of twelve flat pieces, representing all the possible 
combinations of three shapes (triangle, circle, square) with four colours 
(red, blue, green, yellow). Since something like Rorschach colour shock was 
sometimes induced, white pieces were added; white had earlier been chosen 
for Ambiguous Shapes for the same reason. Here also prompting technique 


27 


v 
в, 


DIAGNOSTIC PERFORMANCE TESTS 


was employed, and an elaborate inquiry which included the presentation of 
two additional pieces to test for ‘equivalence’ (cf. Ladd’s practice), one 
being of a shape already represented but of a new colour (brown), the other 
introducing a new shape (diamond), 

As will readily be surmised both these tests are well within the scope of 
even low-grade normal intelligence, and failure may be regarded as a reliable 
indication of impairment, associated with organic lesion, or with neurosis, 
the latter particularly when success on Ambiguous Shapes is accompanied 
by failure on Colour-Form, which in addition to its possibly disturbing 
colours lacks the element of conceptual shift. 

The Trist-Hargreaves test in its present form represents the second stage 
in the evolution of a more formidable series of problems making use of the 
principles underlying the two tests just described. 

The first step consisted of adding colour to the basic forms of the Ambi- 
guous Shapes, not only on the main surface, as in Weigl, but on the edge, the 

- pieces being now given thickness to allow for this. The intention of this 
elaboration (suggested by Hargreaves) was to allow for ambiguity in respect 
of colour classification, since the edge colour differed from the surface 
colour. 

A second part was also added to the test; in this a sorting process, following 
a given pattern, was required to be carried out. In this may be seen an 
attempt to produce in terms of concrete material a problem of the matrix 
type. In so far as it makes use of a sample it also shows the influence of 
Vigotsky and of Goldstein’s approach, in which a process of sorting is 
directed towards the completion of an aggregation of objects which have some 
quality in common. 

Originally it was intended that the colours used should be white and two 
colours of very low saturation indeed—cream and pale duck-egg blue, in 
fact hardly more than shades of off-white. It was soon discovered that under 
test conditions, particularly, of course, in poor or artificial light, these differ- 
ences almost invariably passed undetected. Accordingly normally saturated 
yellow and blue were substituted. x 

Further slight modifications, mainly in the number and order of the 
questions asked, were introduced by Semeonoff during the first routine use 
of the test at Pemberley. 


MATERIAL 


The material of the test consists of twelve wooden pieces, 0:4 inches in thick- 
ness. Six of the pieces are square and six circular, the side of the square and 


28 


THE TRIST-HARGREAVES TEST 


the diameter of the circle each being 1:8 inches. Three of each kind are 
‘solid’ and three ‘hollow’, i.e. are pierced with a centre hole of the appropriate 
shape, one inch across. 

Each piece has the flat surfaces? painted in one colour and its edges in 
another; hollow pieces have their inner edges painted the same colour as the 
outer edges. 

The pieces may thus be said to vary in shape, ‘integrity’ (a term which 
may conveniently be used to denote whether a piece has or has not a hole), 
*top' (or flat surface) colour, and edge colour. 

The colours of the pieces are as follows (here as elsewhere a colour men- 
tioned simpliciter refers, of course, to ‘top’ colour): 

Blue solid square with white edges 

White solid square with yellow edges 

Yellow solid square with blue edges 

Blue hollow square with yellow edges 

White hollow square with blue edges 

Yellow hollow square with white edges 

Blue solid circle with yellow edges 

White solid circle with blue edges. 

Yellow solid circle with white edges 

Blue hollow circle with white edges 

White hollow circle with yellow edges 

Yellow hollow circle with blue edges 
It will be noted that only half the possible colour combinations are used, 
e.g. there is no white solid square with blue edges; also that pieces differing 
only in shape or only in integrity always carry different colour combinations. 
(See also section on Interpretation, p. 37 below.) 


CONTENT OF TEST 


The test consists of two parts: 

Part I. Groups 

Part II. Pairs. 

In Part I (6 items) the subject is required to divide the pieces into four, 
two, or three groups. In Part II (10 items) a ‘pair’ of pieces bearing a demon- 
strable relationship to one another is presented and the subject is required 


1 If more convenient, 13 and 3 inches may be used. } Т : 
2 In the original set only one surface was coloured, the ‘underside’ remaining unpainted. 


It is thought unlikely that there is any advantage or disadvantage in this, or that it might 
lead to differences in approach. 


29 


DIAGNOSTIC PERFORMANCE TESTS 


to produce other pairs in which the two pieces bear the same relationship to 
one another. 


ADMINISTRATION: PRELIMINARY 


The subject should be seated in a good light; in daylight preferably not 
with his back to the window, and in artificial light not immediately under 
the light source. 

The pieces are scattered in front of the subject, and he is invited to examine 
them carefully. If he does not pick them up of his own accord he should be 
encouraged to do so, and if necessary to turn them over. (This is done to 
allow the subject every chance of noticing the edge colours. His attention 
should not, however, be specifically called to these. It is noteworthy [and 
probably of some interpretative significance] that a few subjects entirely fail 
to notice the edge colours, even so.) If thought desirable the subject may be 
told that this is only to make sure that he ‘has seen them properly’, and that 
he will not be required to memorize the colours. 


ADMINISTRATION: PART 1 


1. When the subject has examined the pieces, he should, without further 
preamble, be given the instruction: 

‘Now divide them into four groups." 

Few subjects will have difficulty in doing this, although a surprisingly 
large proportion will show some hesitation. No prompting should be given 
until and unless an incorrect solution has been offered, or unless the subject 
has asked for further information. If he asks whether each group should 
contain an equal number a simple affirmative answer should be given. It 
is also permissible to indicate that a single principle should govern the group- 
ing. This is a difficult point to convey but it is unlikely that a subject who 
experiences difficulty at this stage will be capable of framing this question or 
profiting by the answer. 


2. Without disturbing the pieces from the positions in which the subject has 
_ left them, say, 
‘Now divide them into two groups.’ 
The reason for not mixing the pieces between each problem is that the way 
that the subject uses or fails to use his previous placement is informative. If, 
however, the subject himself mixes the pieces no objection should be raised. 


30 


we 


THE TRIST-HARGREAVES TEST 


3. The third problem is introduced with the words, 

‘Now (into) two groups another way. | 

A note should be kept of the principle used for each grouping. The expecta- 
tion is that sorting by shape will precede sorting by integrity, but thescoring 
(see below) is unaffected if this order is reversed. 

The wording of the remaining questions is similar to what has gone before, 
with the necessary modifications: 

4. ‘Now divide the pieces into three groups.’ 
5. ‘Now into three groups another way.” 
6. ‘Now three groups another way still.” 

For the second half of Part I the subject is of course expected to change 
over from classification by shape (including integrity) to classification by 
colour. This, however, is a step that is sometimes difficult to take, particularly 
for an impaired subject. The most usual faulty alternative is to leave one of 
the groups just made as it is, and to split the other into two, thereby partially re- 
verting to Question 1. Such a solution may be rationalized in fairly acceptable 
terms, particularly in the case where the three groups made are solid circles/ 
solid squares/hollow pieces, i.e. ‘two kinds of whole (or “full’’) pieces, and 
pierced pieces’. When a subject produces such a grouping he should be told 
that the groups should all contain the same number of pieces; if it is con- 
sidered necessary or desirable one may point out that this grouping is ‘not 
really different from the previous ones’. 

The most usual order is classification by top colour, by edge colour, and by 
combination of colours (i.e. blue pieces with white edges plus white pieces 
with blue edges forming a single group, etc.). It may be noted that this last 
possibility was not envisaged in the earliest draft of the test, in which there 
were only two questions requiring division into three groups. Classification 
by combination of colours represents а higher order of conceptualization 
than anything considered so far, and is in fact the main discriminating feature 
in Part I, at least for subjects of normal intelligence. Nevertheless it is not 
infrequently given first, perhaps because it may have been recognized as 
‘better’ (i.e. more strictly logical) than either of the other possibilities, As 
before, it is immaterial for scoring purposes which is given first. 


ADMINISTRATION: PART II 


When Part I has been completed, the subject should be told that the remainder 
of the test will consist of problems of a slightly different kind. In Part Ша 
‘sample’ or ‘pattern’ pair of pieces is presented, and the subject is told: “Неге 
is a pair of pieces; make five more pairs like that.” 

31 


DIAGNOSTIC PERFORMANCE TESTS 


Occasionally this is misunderstood, the subject thinking that either or 
both of the sample pieces may or must be used again. If this happens or if 
considered necessary as an alternative to the instruction as quoted, the 
examiner may say: ‘Divide all the other pieces into pairs on this principle (or 
“following this pattern")'. If the subject still fails to understand, some form 
of words such as the following may be used: 

*You will notice that these two pieces (the sample pair) are alike in some 
ways and different in some ways; try to make five other pairs in which the 
pieces are alike and different in the same ways as these." 

In all questions in Part II, the blue solid square (with white edges) is used 
as a ‘starting-point’. It is paired with others in the following order: 

. Blue hollow square (with yellow edges) 

. Blue solid circle (with yellow edges) 

. White solid circle (with blue edges) 

. Yellow hollow square (with white edges) 

. Blue hollow circle (with white edges) 

. White hollow square (with blue edges) 

. Yellow hollow circle (with blue edges) 

. Yellow solid circle (with white edges) 

. White hollow circle (with yellow edges) 
10. White solid square (with yellow edges). 

The ‘pairing principle’ (i.e. relationship between the sample pieces, to be 
reproduced) in each case is as follows: 

1. Same shape and top colour: different integrity and edge colour. 

2. Same integrity and top colour; different shape and edge colour. 

3. Same integrity, different shape; top and edge colours reversed (i.e. top 

of each matching edge of other). 
. Same shape and edge colour; different integrity and top colour. 
. Same top and edge colour; different shape and integrity. 
. Same shape, different integrity; reversed colours. 
. Shape, integrity, and both colours all different. 
. Same integrity and edge colour; different shape and top colour. 
. As (7). 

10. Same shape and integrity; different top and edge colour. 

It must not be assumed that every subject will work out all these relation- 
ships. Much may be achieved, at all levels of abstraction, by a process of 
elimination. Further, it should be noted that in all cases the correct answer 
is in fact over-determined, i.e. it is not necessary to use all the data to arrive 
at the correct solution. 

Іп Question 10 the problem must be worded differently, e.g. ‘This time it’s 
a little different; how many pairs can you make like this one?’ This change is 


32 


MO со--3с CA мы 


ошоло tn ы 


“(LS "d әә5) әде} uo 7519 ‘9945-21095 
*saoetd jo uonisodsip 30N "(79 `d 39s) a1enbs әш шуул атцзиоцеэл eures 
әш INO 8шАц 0} o»uo1ojoid ur “риру S11 ut 3901930] ѕәоәта OM} 34 01 3noqe 
24 0} uses sr ou :с1 шәүдол uo peaesuo josfqng :ячупо$ MOTIOH Tayo 


TRIST-HARGREAVES TEST: А typical lay-out (for Part ІІ, Problem 7). Note 
the two columns, each of three ‘like’ pairs. Having found pairing pieces 


to each of the ‘solid squares’, the subject has extended the principle to 
cover ‘solid circles’. 


THE TRIST-HARGREAVES TEST 


necessary because it is of course possible to make only four pairs with the 
same shape and integrity. 

The ostensibly possible eleventh question (using the third of the solid 
squares) is not included, since the situation it would represent is exactly the 
same as Question 10, and subjects who fail to solve Question 10 satisfactorily 
are usually a little upset, necessitating the tactful application of therapeutic 
closure. (For a discussion of the analogous similarity between Questions 7 
and 9, see below, p. 35). 


SCORING AND PROMPTING 


Part 1 
Correct groupings are scored as follows: 
1. Four groups - 4 points. 
2 and 3. two groups: 
grouping by shape — 4 points 
grouping by integrity - 6 points 
— irrespective of which comes first. 
4, 5 and 6, three groups: 
grouping by top colour - 3 points 
grouping by edge colour - 6 points 
grouping by combination - 9 points 
— irrespective of which comes first. 


It seldom happens that a partially correct solution is offered. If this occurs 
and appears to be due to a slip rather than to a failure of classification, the 
subject may be asked ‘Are you sure that’s right?’ If the mistake is corrected 
spontaneously no penalty should be attached, i.e. full credit is given. If it is 
not, a pro rata score for correct groups is allowed. In this context a group is 
. considered correct if, although incomplete, it does not contain any piece 
that does not belong. Thus, if in a two-group problem one piece only has 
been misplaced, the group containing seven pieces is ‘wrong’ but the group 
containing five pieces is ‘correct’. The score would then be 2 or 3 points, for 
one correct group, according to whether shape or integrity is the principle 
of grouping being employed. 

` Prompting has been found to be an even more useful—and indeed essential 
—feature in Trist-Hargreaves than in Semeonoff-Vigotsky. Particularly is 
this so in the case of a diffident or emotionally disturbed subject who may 
not realize how simple the task is, especially in its early stages. Consequently, 
if a true assessment of his capacity for conceptual thought is to be obtained, 


33 


D 


DIAGNOSTIC PERFORMANCE TESTS 


it is important to help him over any such initial difficulty. Nevertheless, un- 
limited or random prompting is to be discouraged. It is recommended that 
prompting should be restricted to the following circumstances: 


Part I (1) If at the outset the idea of a principle of classification seems to be 
causing difficulty it may be explained in simple terms. If this is necessary, the 
solution to question 1(1) should be given 2 points instead of 4. 


Part I (2-3) If the subject fails at first to grasp the idea that ‘different’ pieces 
may be combined in one group, some such words as ‘Can you not find 6 
pieces which are alike in some way?’ may be used. In such a case the score 
for the first solution offered should be halved. If the subject cannot even then 
offer a solution there is of course no point in asking for a second two-group 
solution. Incorrect sortings are very rare at this point in the test, but if an 
incorrect solution is offered the subject should be asked to explain it, and a 
second attempt may be allowed. If this produces a correct sorting the subject 
should be allowed the opportunity to produce the other correct two-group 
solution. In sucha case the initial ‘failure’ should be penalized as for a prompt. 


Part I (4-6) The type of error most frequently encountered at this stage 
(groups of 6, 3, 3) has already been mentioned (p. 31 above). Since this may 
be regarded as due to a misunderstanding of instructions which may, perhaps, 
not have been worded quite explicitly enough, no form of penalty for the 
prompting necessary to correct the misapprehension is recommended. 


Total failure to shift from shape to colour represents a disability serious 
enough to transcend scoring problems entirely. If, however, the examiner is 
anxious quite literally to ‘test the limits’ in this connexion he may ask the 
subject (if necessary restoring the earlier groupings for easier reference) 
whether all the pieces previously grouped together were ‘quite alike’, and, 
if not, whether observed differences could be utilized in some way. 

Difficulty with the three-group problems, usually, however, results from 
the subject not having taken sufficient cognizance of the edge colours. If this 
appears to be so, the subject may be asked: ‘Are you sure you examined the 
pieces carefully enough?’ This will usually be sufficient help, and if so the 
value of the edge colour solution should be halved. 

It is difficult to specify the amount of prompting admissible towards the 
colour combination solution. The most usual third approach is in terms of 
‘inner edge’ colour, which is of course the same as ‘outer edge’ colour. Some- 
times the flat surface colour of the solid pieces is interpreted as ‘inner’ colour, 
and if this is allowed a consistent and unique solution can be reached. It 


34 


THE TRIST-HARGREAVES TEST 


cannot, however, be regarded as a ‘defensible’ solution. It should be neither 
accepted nor penalized: if desired, its arbitrariness may be explained to the 
subject. In earlier Instructions it was suggested that ‘6 or 3 points [be] 
awarded according to whether little or much help has been given’. This may 
now be amplified by saying that oblique reference to differences within 
groups already made (ef. last paragraph but one, above) may be considered 
admissible at this stage, and interpreted as constituting ‘little’ help. ‘Much’ 
help would involve dividing the pieces into pairs of identical colouring, and 
helping the subject to realize that in the colour groupings the members of 
each pair always went together, but would be associated with certain other 
pairs, according to the principle used. (This point can be fully appreciated 
only with the material in front of one; the student is therefore advised 
to familiarize himself with all possibilities before attempting to administer 
the test.) 


Part II 

In Questions 1-9, each correct pair scores 1 point. 

In Question 10, the score is the number of correct pairs minus the number 
of incorrect pairs; where the latter exceeds the former the score is neverthe- 
less zero. 

The earlier Instructions called for a distinction in the scoring of Question 9 
depending on the interpretation of the relationship between the pieces paired 
in Question 7. 

In Question 7, the pairing principle (if fully specified) may be regarded as 
‘different shape, integrity, and top colour’ and either ‘edge of hollow piece 
matching top of solid piece’! or ‘edge of round piece matching top of square 
piece’, This leads to the result that while the pairings of solid squares and 
hollow circles are uniquely determined, hollow squares and solid circles may 
be paired in two different ways. Exactly the same applies, mutatis mutandis, 
to Question 9, and it was previously laid down that full credit could be given 
in Question 9 only if the pairing of the hollow squares and solid circles was 
different from that produced in Question 7. Without going into the reasoning 
underlying this requirement, it may be remarked that an equally valid argu- 
ment could be advanced for the contrary position: i.e. full credit only for 
the same pairing of hollow squares and solid circles. (The interested reader 
may care to work out the problem for himself.) It is now proposed to abandon 
the distinction and to allow full credit for either solution in each case. The 
effect of this change on the norms has been found to be negligible. 

Prompting is seldom necessary in Questions 1-9 of Part II, except possibly 


1 This is the pairing principle shown fully worked out in the plate facing p. 33. 
35 


DIAGNOSTIC PERFORMANCE TESTS 


at the outset. If the fuller explanation noted in the preceding section (p. 32) 
15 necessary, no credit should be given for the first question of Part II. 

Question 10, on the other hand, frequently causes perplexity. The most 
frequent source of error is that many subjects who have produced a perfect 
performance up to this point (possibly on the basis of an imperfect grasp of 
the principle involved) will be satisfied with a solution consisting of the three 
correct pairs along with two others which comply ‘аз far as possible’ with the 
inferred criteria, the altered wording having been interpreted as a sort of 
challenge. If the subject asks whether this is permissible a negative answer 
should be given, and no penalty exacted. A solution based on exact colour 
relationships (see section on Interpretation, p. 37, below) leads to one correct 
pair. Questions about colour relationships, often raised at this point for the 
first time, should be answered non-committally, e.g. “It’s up to you’. 


ALTERNATIVE (UNSATISFACTORY) SOLUTIONS 


It is likely that further experience of the test in the clinical setting will reveal 
bizarre and otherwise imperfect methods of handling the material in addition 
to those so far encountered. Some such have already been described. inci- 
dentally to the foregoing discussion. Others include the following: 


1. As in the case of Semeonoff-Vigotsky (see p. 20) a few subjects look for 
“minute differences of size, texture, or colouring. Since such behaviour is 
usually meaningful it is probably desirable that the chance of its occurring 
should not be blocked by insisting on absolute uniformity of cutting of the 
pieces and similar accidental variation. Nevertheless, care must be taken 
not to allow the subject to persist in the belief that minute differences may be 
of significance. 


2. Again asin the case of Semeonoff-Vigotsky, some subjects resort to setting 
the pieces on edge, or using them to ‘build’ something. This tendency should 
also be countered; it is less likely to occur if there is no unpainted ‘underside’ 
to the pieces, as was the case in the prototype. 


3. Many subjects of rather low intelligence level appear to tackle the pairing 
problems in terms of colour only, or, more rarely, shape and integrity only; 
or colour and shape may be used and integrity ignored. The diagnostic impli- 
cations are rather different according to whether the tendency is one of 
rigidity, i.e. of attempting to reproduce colour relationships exactly, or of 
slackening of his conceptual framework, i.e. ignoring the need for close 


36 


THE TRIST-HARGREAVES TEST 


correspondence and being satisfied with loosely defined similarities and 
differences. The former tendency would appear to correspond to ‘narrowing’ 
in object-sorting (see Rapaport, 34), and the latter to ‘loosening’. They may 
be regarded as indicative of depressive and schizophrenic trend respectively. 


4. Although perhaps less readily than Semeonoff-Vigotsky, Trist-Hargreaves 
gives scope for sorting by distribution, which is, of course, to be regarded as 
a failure of conceptualization. A three-group division often encountered is 
‘one of each kind’ (i.e. a random allocation of one of each of the basic shape- 
integrity types to each of three groups). When this occurs, following one or 
two correct sortings, it may be pointed out that the sorting just made could 
equally well have been accounted for in terms of the idea the subject is now 
pursuing. 


5. Partially defensible alternatives may turn up from time to time, and must 
be dealt with each on its own merits. Thus a 15-year-old boy of superior in- 
telligence recently made ‘two groups’ by separating the blue pieces plus the 
yellow with white edges from the white pieces plus yellow with blue edges. 
Тһе principle was 'darker colour on top as against darker colour on the edge’. 
This sorting was made with a non-standard set in which the blue was rather 
darker than that now provided, but it might no doubt have occurred even 
with the standard colourings. No interpretation of this particular pattern of 
response is offered, but in like cases one should try to discover whether such - 
a solution is offered as a deliberate effort to avoid the obvious, or whether 
genuine inability to perceive the obvious is present. In the latter case it may 
be regarded as indicative of ‘lack of common sense’, on the analogy of low D 
in Rorschach, 


INTERPRETATION: GENERAL 


Absence of sufficient clinical evidence precludes detailed interpretation of 
the many recognizable patterns of approach in Trist-Hargreaves. Many have 
their analogues in Semeonoff-Vigotsky, and pending validation correspond- 
ing interpretations may be tentatively advanced. It is also suggested, on the 
basis of similar observations by Trist (43), that failure to deal as adequately 
with the colour variables as with the shape variables is indicative of instability. 

The type of failure to deal adequately with colour most frequently en- 
countered is inability to apprehend colour relationships in generalized terms. 
Such subjects usually try to reproduce the exact colour relationships of a 
sample pair in Part II, in terms of top colour only, or of edge colour as well. 
Thus, problem П-3 may be interpreted as ‘blue square with white circle’, and 
on this basis only one further pair can be made. The concept ‘blue square’ 


37 


DIAGNOSTIC PERFORMANCE TESTS 


may then be extended to cover a square piece with blue edges, and similarly for 
‘white circle’, Two further pairs can then be made, after which blocking may 
occur, or the subject may tacitly shift his ground, loosening his criterion to 
that of ‘blue piece with white piece’. Or again, this may be the original 
criterion, interpreted in terms of top colour only, in which case the subject 
will find himself at a loss when left with only pieces yellow on top. One could 
cite many other similar instances, which often merely indicate low intelligence 
level, but may also stem from emotional difficulties, or from sources associated 
with other forms of disturbance of conceptual thinking. Their full significance 
is unlikely to become evident unless some form of inquiry follows the admini- 
stration. ) 

Indications in relation to the obsessive-compulsive syndrome may be 
obtained from the manner in which the subject sets out the pieces, particularly 
in Part II. The nature of the problem and of the material lends itself to a 
methodical layout, and it will probably be found that a majority of subjects 
spontaneously arrange their pairs in columns, either one column of six pairs 
or two parallel columns of three pairs each as shown in Figure II. This must 
on no account, therefore, be taken as showing marked obsessive tendency, 
which will probably only show itself in the manner in which such columns 
are laid out, or in an exaggerated attention to placing the colours in corres- 
ponding or symmetrical positions. We are inclined rather to attach unfavour- 
able significance to a disregard for orderly layout, which is frequently associ- 
ated with over-confidence and superficiality in one's approach to conceptual 
problems. 

A pattern symptomatic of anxiety sometimes develops in an imperfectly 
worked-out attempt at solution, using a systematic layout, particularly when 
the subject does not leave the sample pair in what would normally be re- 
regarded as a ‘key’ position (usually at the ‘top of a column"). In such cases 
the subject may ‘forget’ which is the sample pair and separate the two pieces 
that form it. When this occurs it should of course be pointed out to the 
subject at once; the degree of anxiety, or insecurity, present will be evident 
in the nature of his response to correction. 

It may be remarked that studied indifference to methodical layout may, 
paradoxically, itself be an indication of obsessive trend, particularly in 
subjects of high aesthetic sensibility, who may be at great pains to produce a 
pleasing distribution of the pairs or groups over the table as a whole. 


POINTS FOR FURTHER INVESTIGATION 


The main individual feature of the test is the way in which it lends itself to 
38 


THE TRIST-HARGREAVES TEST 


the use of prompting technique, and it would therefore appear that a first 
requirement would be a fuller standardization of the prompting system, 
together with an investigation of the relevance, for scoring purposes, of the 
system of penalties at present used only to rather a limited extent. 

Trist-Hargreaves does not present the same difficulties as regards re-test 
as does Semeonoff-Vigotsky, particularly in Part II. Nevertheless, parallel 
forms would be useful for various purposes, and details of two such will be 
found below. 

No data are available on the relative difficulty of these, but informal 
application for demonstration purposes has strongly suggested that 
there is very little transfer, even when the subject knows that the material 
has been designed to illustrate identical principles in terms of different 
variables. ” 

A variant suitable for group administration has been designed by Lovell 
(24). Whether it tests similar cognitive functions remains to be demonstrated, 
although there is a strong prima facie case for supposing that this is so. On the 
other hand, no group test can evoke the patterns of behaviour or afford 
opportunities for observation of the type to call attention to which is a main 
purpose of this book; consequently group tests intended to act as substitutes 
for any of our tests have been regarded as outside our scope. A whole new 
field of investigation, however, lies open. 


PARALLEL FORMS 


Two alternative sets of material intended to serve as the basis of parallel 
forms of the Trist-Hargreaves test have been produced, and have been tried 
out informally by Semeonoff. In each case three ‘variables’, corresponding 
to Trist-Hargreaves ‘Shape’, ‘Integrity’, and ‘Colour’, are distributed, in an 
analogous fashion, over 12 ‘pieces’. 

(1) A form devised by Mrs Elsie Elliott (née Macdonald) while working 
as psychologist at Shenley Hospital, St. Albans. The material consists of 
circles, squares, and triangles cut out of black or grey cardboard (‘outer’ 
shapes), on which are mounted smaller circles, squares, or triangles of pink 
or pale blue paper (‘inner’ shapes). No ‘inner’ is mounted on an ‘outer’ of the 
same shape as itself. 

(2) A form devised by Semeonoff consists of white cards on which are 
mounted ‘spots’ of green or dark red paper, of two sizes. The spots appear 
in ‘clusters’ of two, three, or four, and each card bears two, three, or four 
such clusters—never the same number of clusters as there are spots in each 
cluster on that card. 

39 


DIAGNOSTIC PERFORMANCE TESTS 


The use of the variables may be summarized as follows: 


‘Two-way’ variables ‘Three-way’ variable 
a. b. 
Trist-Hargreaves Shape Integrity Colour 
Elliott Outer colour Inner colour Shape 
Semeonoff Colour © Size of spot Number 


Either set of material lends itself to grouping problems exactly analogous 
to those of Trist-Hargreaves, with the possible exception that in both cases 
the ‘two-group’ problems should perhaps precede the *four-group' problem. 
In the Elliott form the third three-group solution (shape-combination) is not 
easily verbalized. In the Semeonoff form the number-combination solution 

„resolves itself into classification by simple number of spots on the card (e.g. 
3x222x 3), but this is seldom realized by the subject. 

Although neither form has been tried out clinically it seems highly probable 
that carry-over, particularly after some lapse of time, would be negligible. 

«Subjects to whom the material has been demonstrated for teaching purposes 
almost invariably—even when it is explicitly made clear that the principles 
involved are identical with those of Trist-Hargreaves—fail to apply those 
principles at all readily. Quite apart from the phenomena of blocking it 
would appear that both forms are more difficult than Trist-Hargreaves. This 
would seem to apply in particular to ‘pairing’ problems (i.e. Part П of Trist- 
Hargreaves), for which the parallel material appears to be less appropriate 
than the original. 


СНАРТЕК 4 


Trist-Misselbrook-K ohs 


The version of Kohs Block Designs Test here presented embodies the most 
marked departure yet made from the author's original method (20). It had its 
origin in Trist’s work at Mill Hill Emergency Hospital, to which reference 
has already been made. It was to some extent influenced by the approach 
adopted by Goldstein and Scheerer (12), whose work on the impairment of 
‘abstract attitude’ associated with head injury included an analysis of the 
types of error made by patients in the reproduction of Kohs' designs. Since 
they were not interested in using the technique as a means of arriving at а 
quantitative assessment, Goldstein and Scheerer recommended that the 
examiner should feel free to vary the procedure in any way which might give 
added information on the individual case. This freedom of variation, although 
it has its parallel in other techniques described in this manual, was not in- 
corporated in the Trist-Misselbrook test (hereafter T-M-K). A further point 
of contact, however, is that Trist and Misselbrook, like Goldstein and 
Scheerer, were interested in the possibility of using the test to throw light on 
the capacity of the subject to learn. A major consideration, in fact, in the 
construction of the test was to design it in such a way that it might act as а 
learning situation. To this end the problems were grouped into distinct 
series, each introducing a new principle. Each series further embodies a 
sequence of shifts, representing in general similar or comparable modifica- 
tions of principle in each series. As far as possible this requirement was com- 
bined with progression of difficulty, a progression which was of course 
duplicated in the succession of series. A still further element relating to learn- 
ing, the use of further trials immediately following the first application of the 
test, was part of this original intention which, indeed, envisaged an application 
of Murray's tripartite criterion of learning (32, pp. 511 et seq.)—the total time 
taken to obtain mastery, the number of errors and the number of repetitions 
of stimuli required. This project was abandoned in favour of a single repeat. 
This was not normally included in the Services use of the test, but was inde- 
pendently added by Semeonoff to the standard procedure at Pemberley. 


4l 


DIAGNOSTIC PERFORMANCE TESTS 


Further details of the structure of the various series will be found in the 
section on Principles below. 


MATERIAL 


The material consists of 24 Kohs blocks of standard pattern, i.e. one-inch 
cubes, four sides of which are painted with ‘full’ colours; red, white, blue, and 
yellow; and two divided along the diagonal: red/white and blue/yellow. 

In addition there are five hinged ‘boards’ (one used for demonstration). 
On one inner surface of the board the designs to be reproduced are shown 
on a scale of 1 : 2; the other has ‘frames’ cut out, of the correct size and in 
the correct orientation for the reproduction of the designs by means of the 
blocks (see plate facing, p. 17). The boards, or the respective series of prob- 
lems which they contain, are designated SAMPLE, A, B, C, and D. Table 3 
below summarizes the number and size of the designs on each board, to- 
gether with the time-limits allowed and ‘possible’ score (see section on 
Scoring, p. 48). 


у 


Table 3 T-M-K: SUMMARY OF THE TEST MATERIAL 
"n 


Board No. of Size of Time-limit Possible 
designs design (secs.) score 
Sample 2 2x2 — — 
A 4 2x2 90 8 
B Ы 2х2 120 20 
С 4 3х2 90 24 
р 4 3х2 180 24 
All (Test 
proper) 17 - 480 76 


Many radical differences from the original Kohs test will be noticed; іп 
addition to the arrangement of the designs in groups, and the provision of 
‘frames’ in which to construct them, it should be noted that 

(i) No design requiring more than 6 blocks (3 x 2) is used, whereas Kohs 

goes up to 16-block (4 x 4) designs. 

(іі) Time-limits are much shorter, and the unit for timing is the board, not 
the individual design. A single application can usually be accomplished 
well within fifteen minutes, including ample time for explanation and 
demonstration, and the full test, with repeat, in under half an hour. 

The designs used are shown in the figures on pp. 44,45. Three of the 


‚ 42 


TRIST-MISSELBROOK-KOHS 


original Kohs 2 x 2 designs (5, 7, 8) are not used, and four entirely new 
designs of this size are included; all the 3 x 2 designs are of course also new. 
Correspondence with the Kohs designs is as shown in Table 4. 


Table 4 RELATION OF T-M-K DESIGNS TO ORIGINAL 
KOHS DESIGNS 


T-M-K Kohs 
Sample 1 1 
Sample 2 2 

A1 Trial A 
(inverted) 
A2 3 
A3 — 
A4 — 
ВІ 4 
(inverted) 
B2 6 
B3 9 
B4 - 
BS - 


PRINCIPLES OF THE SUCCESSIVE SERIES 


Since the test was designed for clinical use as well as for application as an 
ordinary performance test, the range of difficulty represented is very wide 
indeed. Nevertheless, the nature of the progression of the problems is such 
as to make it noticeable, at least implicitly, to most subjects, and the time- 
limits are sufficiently stringent to allow head room except at the very highest 
levels of ability. 


A. The first series of problems consists of what may be termed basic opera- 
tions: the joining of likes to form larger masses of ‘solid’ colour, the uniting 
of mirror halves, apprehension of figure-ground relationships, and, in prob- 
lem 4, the first step towards rotation. No diagonal construction is required. 


B. At this stage the concept of the diagonal is introduced, in a variety of 
forms, but hingeing basically on realization that ‘half’ areas must be combined 
in contrary positions to produce the required effect. From Problem 3 the 
introduction of a ‘solid’ piece is required, to deal with designs which combine 
the features of solid mass and diagonal construction. Shifts in figure-ground 
relationships as between successive designs are also involved. 


43 


ТЕ15Т7-М155ЕІ.ВКООК-КОН5 DESIGNS 


"P240(02 SNOSIS3Q- SHON—MNOOUATISSIW-LSIAL 


DIAGNOSTIC PERFORMANCE TESTS 


C. In the third series the increase in the number of blocks per design is 
accompanied by a general decrease in symmetry throughout the designs. 
Furthermore, the very meagre time allowance makes it imperative to act 
quickly if anything approaching full credit is to be achieved, The result is 
probably to penalize what Goldstein and Sheerer call the ‘concrete’ or ‘total 
matching’ approach in contrast to the ‘abstract’ or ‘analytic’ method. The 
sequence of problems is arranged so as to require alteration between con- 
trasted approaches. Thus, Problem 1 is relatively ‘open ended”, i.e. it may be 
tackled either way; Problem 2, with its solid mass of red, strongly evokes 
‘total matching’; Problem 3 has a broken, disorganized design which stimu- 
lates the analytic approach, in terms of which it can be solved very quickly ; 
Problem 4 shows a return to organization, plus fairly strong figure-ground 
suggestion. 


‚р. The fourth (and, properly speaking, final) series makes extended use of 
the feature contained in Design 17 (the last) in the original Kohs series— 
that of elimination of the outside boundary line. It carries strong potentialities 
for disorientation, due to the heavy influence of the gestalt qualities of the 
figures, and particularly the difficulty of apprehending these in relation to the 
frames, of which three out of the four are set obliquely to the rectilinear edges 
of the board. The sequence of the designs reflects a counterpoint of organiza- 
tion and disorganization. $ 


ADDITIONAL SERIES 


In the form of T-M-K used in the Services two additional series of problems, 
E and F, were used, in order to provide added discrimination at the upper end 
of the intelligence scale. These consisted of a re-presentation of Boards C and 
D, but with the instructions to make the designs in reverse, i.e. to put blue in 
place of yellow and vice versa, and similarly with red and white. While this 
achieved its end in relation to scoring, it appears neither to add to the 
examiner’s knowledge of the nature of the subject’s thought processes, nor 
to evoke behaviour substantially different from that associated with his 
handling of the earlier series. Considerations of time and attitude would also 
probably preclude a repeat when E and F are used, and it is strongly recom- 
mended that the Repeat be preferred. ^ 
A further possibility, with which the writer has experimented sporadically, 
sis to ask the subject to reproduce certain or all of the designs upside down, i.e. 
in such а way that if they were lifted out of the frames and replaced lower 
side, uppermost the correct design would appear. The feasibility of this 


46 


Lemma! 


i 


TRIST-MISSELBROOK-KOHS 


addition to the test would of course depend on the coloured faces of the 
blocks being placed in standard positions in relation to one another (a con- 
dition which is not invariably met with). It is not now suggested as a desirable 
addition for routine use, but it is undoubtedly a difficult task which sometimes 
acts as a stress situation, and which might prove of diagnostic value in rela- 
tion to occupational requirements, etc. 


ADMINISTRATION 


"The blocks should be scattered in front of the subject in such a way that no 
blue/yellow side is uppermost. (This proviso ensures that the subject must 
turn some blocks in making the second sample design.) The examiner then 
picks up one of the blocks and calls attention to the various colours; he 
Should emphasize the fact that all the blocks are identical and interchangeable. 


Sample Board: The sample board is placed open in front of the subject, with 
the frames nearest him, and the blocks beyond it. The first sample design is 
demonstrated, showing how the blocks fit into the frame. When it is clear that 
the subject has understood he should be told, ‘Now do the second one 
» yourself.’ If the subject makes to take the blocks from the first frame, he 
hould be told, *Just leave those meantime.' When the subject has finished, 
e should be told, "That was just for practice; now we are going to begin. 
Work as quickly as you can. If you find a design is proving too difficult, 
leave the blocks as they are and go on to the next; then if you have time you 
сап come back to it later.’ (These exact words need not, of course, be used.) 
„If the subject asks whether there is a time-limit, he should be told there is one, 
but that he is not allowed to know what it is. 


Board А. Open the first board and place it in the same position as the sample 
board, saying, ‘Here is the first set: there are four designs, each requiring 
four blocks. Ready? Go!’ 

If the time-limit is reached before the subject is finished, note how many 
blocks have been correctly placed, and allow up to about 30 seconds for 
him to finish the design on which he has been working, but do not let him 
start another. The closure should not be applied in such a way as to discour- 
age the subject, When he has finished get him to remove the blocks and put 
them back with those that have not been used. If he asks whether they should 
be mixed, say that it is immaterial. ‘ 


Board B. Substitute Board B for Board A with the comment, “This time 
47 * 


DIAGNOSTIC PERFORMANCE TESTS » 


there are five designs, each again requiring four blocks’, and proceed as 
before. If Design 5 is made with a single red block set squarely in the middle 
of the frame, score 0, but point out that in every case the available space must 
be filled. If the subject is anxious to try, say that there will bea chance to do 
so later. 


Board C. As before, but this time point out that each design requires 6 
blocks. If the full white block in the lower left corner of Design 2 is not put 
in, point out the error as in the case of B5—at the end of the period allowed 
for Board С. : 


Board Р. Call attention to the peculiar feature of this board, using some such 
form of words as, ‘This time you notice that the outline of the design has not 
been drawn, but as before you have to make the design in the frame provided; 
fill it with blocks so that all the lines are in the same direction as in the pattern.” 
Add any further explanation necessary, short of demonstrating or making 
reference to any specific part of the solution. 


Repeat. Allow the subject to make whatever brief comment he likes, so long 
as a lengthy discussion is not allowed to develop. As soon as convenient, 
introduce the Repeat, e.g. as follows. ‘That was very good, but we also like 
to see how people do once they have got accustomed to the blocks. So let’s 
start again, right from the beginning [‘even the easy ones’ may be added if a 
near-perfect performance has already been achieved]. Perhaps you will be able 
to knock something off your time—but it doesn’t matter if you don't." 

The Repeat is administered in exactly the same way as the first application, 
except that introductory comments may be abbreviated or omitted, and maxi- 
mum encouragement should be given. This latter may be interpreted. very 
freely; the subject may even be told that he has ‘saved’ so many seconds. If 
he is making poor progress call attention to whatever success he has achieved. 


‚ As before, stop him at the end of the problem during which the time-limit 


is reached, but if he is very anxious to attempt an outstanding problem allow 
him to do so. 

For the purpose of keeping a check on the time taken for each design it is 
sufficient to keep a cumulative time record, i.e. to note the time at which each 
design is completed (or abandoned). If desired, the time taken for each 
design may later be found by subtraction, and noted. 


SCORING 


Each block correctly placed scores one point, except that in Series A Designs 
48 


* й TRIST-MISSELBROOK -KOHS 


1 and 2 do not score at all. Errors in these two designs are so rare as to make 
it justifiable to regard them as accidental and therefore not to be penalized. 

Partial credit should be given for designs left incomplete, or which contain 
errors. АП such designs are scored for the state in which they stand when the 
time-limit is reached. If a design which is nearly correct is ‘spoiled’ by sub- 
sequent adjustment this should be noted, but the higher score cannot never- 
theless be credited, except as provided for in the next paragraph. This is the 
principal reason why the subject should be discouraged from ‘destroying’ 
a faulty or incomplete design before going on to the next. 

If the subject finishes with plenty of time to spare he should be invited to 
check his solution. No penalty is exacted for a correction made, but it is 
important that the opportunity to check should not be given only when the 
subject has let an error pass unnoticed. A design ‘spoiled’ during the checking 
period should be given the score it merited when the subject first said he 
was finished. 


INTERPRETATION 
In spite of its simplicity, and to many subjects its relative familiarity as com- 
pared with the tests of concept-formation, T-M-K is probably the richest in 
interpretative possibilities of all the tests described in this manual. It evokes 
patterns of diagnostic interest at all levels of behaviour, and in particular the 
attitudes (and sometimes anxieties) it rouses are extremely varied. 

At Pemberley T-M-K was used as the basic test in the performance battery 
(see p. 6); it was administered to all candidates in a 30-minute test period 
devoted to it alone and always preceding the application of any other per- 
formance test. It early became apparent, however, that contrary to expecta- 
tion, to some candidates it presented a more formidable and disturbing 
situation than either the ‘difficult’ Semeonoff-Vigotsky or the complex and 
time-consuming Carl Hollow Square. It has been suggested that the bright 

© colours of the cubes arouse something like Rorschach ‘colour shock’ (19, 
рр. 276-8) in some subjects." No data having specific bearing on the possi- 
bility of colour shock have been collected, and the first author is of the 
opinion that it would operate only with seriously disturbed patients. A 
cognate but more superficial effect is, however, quite often encountered: to 
many subjects the material carries a strong suggestion of kindergarten 
equipment, and various attitudes may be aroused in consequence. Some 
subjects are offended by what they take to be tantamount to an invitation 


1 To meet this possibility a closely similar test using black and white cubes was designed 
by Misselbrook, but this has never been standardized. 


E 49 


DIAGNOSTIC PERFORMANCE TESTS 


to regress, and tend to treat the problems with unmerited contempt. Ulti- 
mately, this is probably interpreted correctly as a defensive attitude, similar 
to that expressed in the excuse for an inferior performance which some 
subjects tender: ‘I could have done it all right when I was at school [ог ‘in 
the nursery’], but not now." 

More acute anxiety still is likely to be aroused if for any reason the subject 
is led to suppose that this apparently childish task has been specially selected 
for him. All these attitudes would appear to indicate insecurity of one form 
or another; they also carry implications of extrapunitiveness. 

A detached ‘rational’ approach to the test is relatively uncommon. It does 
not seem to be predominantly associated with either of the broadly contrasted 
methods of dealing with the problems distinguished by Goldstein and 
Scheerer. Indeed, the distinction between the ‘concrete’ and ‘abstract’ 
methods, although unquestionably valid and easily observable during test 
performance, does not, in our experience, seem to have clearly defined 
personality correlates, at any rate with normal subjects. What does seem to 
be significant is occurrence or absence of breakdown when the subject’s ‘pre- 
ferred’ method fails to produce results. It is particularly interesting to observe 
the behaviour of a subject using the ‘total matching’ technique who has 
started on, for example, D2 on the basis of faulty orientation. Such a subject 
often becomes oblivious of time, and as a result may not have time to proceed 
to other designs on which he might have scored more easily. In such a case 
the scoring element of the test becomes relatively meaningless, as also in 
cases where a misperception of orientation or dimension (e.g. of the width 
of the ‘zig-zag pattern’ in D4) or of relationships (e.g. in D3) may lead to 
high or low partial scores which bear little relation, except in arbitrary terms, 
to success in carrying out the instruction ‘make a design like the pattern’. 

Peculiarities in the handling of the blocks themselves are also worth noting. 
The following list is by no means exhaustive, nor is the interpretative signifi- 
cance always clear. 

(i) Some subjects appear to remain unaware that the blocks are indeed 
interchangeable, or at any rate seem not to act upon the knowledge. Such 
a subject will turn a block over in his hand searching in vain for a desired 
colour, and then discard it in favour of another one. Other subjects, similarly, 
find the colour they want, but seem fated to take a long time in doing so. 
Either pattern seems to indicate some form of dissociation from the last 
situation. 

(ii) Subjects who pick up the required number of blocks (usually in the 
non-preferred hand) before starting are probably literally showing their need 
for ‘something tangible to hold on to’. This behaviour does not appear to be 
related to prudence or forethought. 


50 


TRIST-MISSELBROOK-KOHS 


(iii) A pattern occasionally seen, though less frequently than in the case 
of Carl Hollow Square (see p. 66) is that of building the design outside the 
frame, and then lifting it in. Although this may sometimes help in carrying 
out a reproduction based on the ‘total matching’ procedure, it usually 
amounts to a failure to take advantage of the structuring provided, and as 
such is probably related to the inability to learn which accompanies various 
forms of lesion or disturbance. The examiner should guard against making 
depth interpretations on the basis of what is after all surface behaviour; thus 
a symbolic avoidance of trammelling enclosure should not be hastily inferred. 

(iv) The way the subject disposes of the completed designs at the end 
of a series is sometimes of interest. The more usual practice of mixing the 
blocks, or getting the subject to do so, was abandoned at Pemberley follow- 
ing an administration when this was inadvertently omitted and the subject 
noticed that C2 could be lifted bodily into the frame for D1 and put right 
by the adjustment of only one piece. (It is possible, incidentally, to use C2 
without any alteration, other than rotation through 45°, to produce D2.) 
Acute perception of this sort coupled with what may be regarded as presence 
of mind seldom occurs, and may be likened to the Vigotsky performance 
described on p. 24. Subjects are, however, quite often able to utilize smaller 
ready-placed units of more than one block, and the organizational ability 
which this implies (possibly analogous to Beck’s Rorschach Z (2) ) is worth 
noting. Standardization in terms of non-disturbance of patterns from the 
previous series would probably be preferable, for this reason, to randomizing, 
but it has been found that some subjects tend to be inhibited by patterns left 
standing. It seems advisable, therefore, to allow the subject to decide (if he 
will) as provided for in the instructions (p. 47 above). 

Varieties of attitudes and behaviour more closely associated with test 
performance proper һауе probably more clearly defined interpretative 
significance. For the most part they are related to the learning elements in 
the test situation, either between the successive designs on the successive 
series, or in relation to the Repeat. 

Taking the latter point first, it may be claimed that the most valuable 
single feature of T-M-K is its double series of norms by means of which it is 
possible to compare the original ап4 the repeat performance and to define 
what degree of improvement in raw score is to be regarded as ‘normal’. Un- 
fortunately, ‘there is insufficient headroom in the test to allow superior per- 
formance on the original application to ‘improve’ in terms of standard score 
(see Table 15, p. 85), but below about the 90th percentile of the general 
population this difficulty does not arise. Most of the gain on the Repeat will 
be found to accrue from Series C and D, except in the case of slow subjects 
who have failed to complete Series B on the first application. 


51 


DIAGNOSTIC PERFORMANCE TESTS 


It may be noted, as any user of the test who collects cumulative time 
records for himself can verify, that a slight slowing down on the easier 
designs is normal and is probably usually due to the slowing down of tempo 
produced by the ‘set’ associated with the difficult problems of series D. It 
should not be regarded as an indication of any sort of disturbance, even if it 
leads to а loss of one or two points on Series A or В. On the other hand, 
a marked decrease of raw score over the series as a whole, which occurs 
more frequently than one might expect, may be confidently regarded as an 
indication of instability. 

It may also be noted that the initial difficulty of C1 very often causes the 
subject to take up to 40-45 seconds (or more) to solve it; once it has been 
dealt with the more difficult designs C2 and C3 (and even C4) can be dealt 
with much more rapidly, so that it is not unusual for complete or almost 
complete success in Series C to be achieved after what appears to be, on the 
surface, a bad start. No such time-pattern appears to hold good for Series D. 
In this the break in difficulty appears to be after D1. Either D2 or D3 may 
prove to be a stumbling block. The solid mass of the former would seem to 
favour the concrete approach, but many find the orientation insurmountably 
difficult; the latter might be regarded as likely to lend itself to analytical 
methods, but it seems that here again orientation causes difficulty, since the 
vertical position in which the two most readily identifiable diagonals are 
seen tends to exaggerate their lengths. 

These difficulties in performance һауе been described at some length since 
the beginner (particularly one who has not himself had experience of the test 
as subject) may be too readily inclined to categorize them as disorders of 
perception verging on the pathological. They do, nevertheless, often indicate 
an imbalance between perception and execution. Goldstein and Scheerer call 
attention (12, pp. 35, 49) to a grosser form of this type of imbalance. The 
more grotesque misproductions there illustrated are to some extent precluded 
by the provision of the frames, of which only severely disturbed subjects fail 
to take account. 

An interesting form of failure, however, is sometimes found in Series D, 
in which the subject appears to be overwhelmed by the difficulty of striking 
a balance between red and white areas, and produces a design containing five 
or even six solid red pieces. It will be found, rather paradoxically, if one 
observes a design of this sort developing, that it is due to a too minute and 
as it were dissociated attention to single edges of the red areas. As such this 
type of response would appear to be analogous to Rorschach dd, and some at 
least of the implications of the latter seem to apply (see e.g. Mons, 29, pp. 
58-9). This example is particularly noteworthy in so far as it calls attention 
to the necessity for observing and interpreting the process by which a design 


52 


TRIST-MISSELBROOK-KOHS 


is made, which may be totally obscured by the contradictory appearance of 
the end-product. 

A second main diagnostic feature of T-M-K is the opportunity it gives for 
Observing reaction to frustration. This occurs predominantly in relation to 
the short time-limit allowed for Series C. As already noted (p. 41) the test 
was designed to be useful clinically and the time-limit by ordinary intelli- 
gence test standards would be regarded as quite obviously inadequate. 
Nevertheless, since Series C is practically the only source of headroom for 
those subjects to whom Series D presents insurmountable difficulties, it 
would appear to serve a useful purpose. Furthermore, if one avoids the temp- 
tation to treat T-M-K (or any other) scores as a direct measure of ‘intelligence’ 
the objection loses much of its force. Acceptance of the contemporary trend 
towards regarding scores on a particular test as indications of efficiency in 
dealing with a specific problem situation, the significance of which is largely 
a matter of interpretation, rather than as measures of intelligence or some 
other ‘ability’, is one of the fundamental assumptions underlying the use of 
the methods discussed in this manual. In the context of T-M-K, again, it is 
necessary to regard the double application of the test as the behaviourial 
unit under consideration. Subsequent adjustment to or recovery from the 
frustration of the initial application can then be assessed. Alternatively, if 
the ‘reversed-colours’ Series E and Е (see p. 46) are used, opportunities for 
recurrence of frustration phenomena will appear at the appropriate points, 
although it is possible that the type of individual most liable to inhibition 
through frustration would also be most strongly affected by the reversal of col- 
ours, which might thus obscure the issue through adventitious reinforcement. 

А further discussion of the view of frustration underlying the designs of 
this part of T-M-K will be found in Rosenzweig's discussion of “Тһе experi- 
mental measurement of types of reaction to frustration' in Murray's Explora- 
tions in Personality (32). The tests there described were very different from 
T-M-K, but the principles are clearly applicable to the interpretations of be- 
haviour in the present test as well, and were used by Trist in his formulation. 

It had also been hoped to demonstrate that certain of the interpretations 
attached to direction of movement, disorientation, faulty perception of con- 
figuration, etc. in the Bender-Gestalt test (4) might be applicable, when 
suitably translated into terms of the Kohs blocks medium. Bell (3) notes that 
much unpublished work on the Bender test was carried out in the U.S. 
services, which suggests that with graphic material the observations would 
not be too difficult to make, but the indications which could be expected to 
be expressible in block design terms are mostly related to quite severe patho- 
logical states. It is perhaps unfortunate, however, that the Pemberley psycho- 
logists were unfamiliar with the work at the time of the investigation. 


53 


DIAGNOSTIC PERFORMANCE TESTS 


POINTS FOR FURTHER INVESTIGATION 


Apart from validation of the interpretation of behaviour patterns specific to 
T-M-K performance, scope for modification lies principally in the need for 
better discrimination at the upper levels of achievement, particularly since, 
as already noted, it is there that comparison with the Repeat score breaks 
down. The obyious remedy would be to extend the range of the test at the 
difficult end, at the expense of cutting down the easier problems, perhaps 
combining Boards A and B into a single series. On the other hand, the test 
as it stands constitutes a convenient single instrument which covers nearly 
all levels of ability, and the present structure also represents a carefully 
planned series of problems (see pp. 43ff) which it would be a pity to disturb, 
except on very good grounds. 

The most promising possibility would appear to be the incorporation of 
a final series taken from the black-and-white versions previously mentioned 
(Footnote, p. 49). Since the black-and-white designs are direct adaptations 
from coloured T-M-K counterparts the black-and-white blocks carry only 
three ‘relevant’ sides (black, white, and diagonal); the other three sides bear- 
ing combinations of black and white which cannot be used for any of the 
designs. This would then present an additional feature to be dealt with—the 
avoidance of ‘non-relevant’ sides. Such an extension of the test would be 
worth investigating, as would, of course, the usefulness of the black-and- 
white series per se and in relation to possible colour shock as mentioned on 
р. 49. 

A final suggestion is the introduction of prompting technique, possibly 
associated with the Repeat only, or with the initial application only. The 
present time-limits would, however, seem to preclude prompting in Series C, 
and it would therefore probably have to be confined to Series D, where it 
would in any case be most usefully applicable. 


54 


СНАРТЕК 5 


The Сай Hollow Square 


ee 


The Carl Hollow Square, a ‘form-board’ test embodying both conventional 
and individual features, is the only test described in this manual which is not 
‘original’ in the sense of having been devised or developed in whole or in 
part by the authors and their co-workers, It was, however, extensively used 
in the Pemberley investigation, and to a less extent at Garston, so that norms 
and other data relating it to other tests discussed are available. It is therefore 
treated herein in equally full detail. 

Very little has been written about the Carl Hollow Square, and its general 
reputation is not very high, but this is probably due to its having been treated 
as a test of general intelligence. Despite the numerous high correlations with 
accepted intelligence tests, singly and in combinations, which the author 
quotes, its validity as an intelligence test seems very doubtful, even allowing 
for its high factor loadings (see Chapter 8). Nevertheless it presents many 
interesting diagnostic features which commended it to those responsible for 
drafting the Pemberley programme. It may be added that reviews of the test 
in the Nineteen-Forty and the Third Mental Measurements Yearbooks (5, 6) 
were rather inaccurate (and contradictory) in the information they gave 
about it. 

Carl gives credit for the basic idea to E. A. Lincoln, author of the Lincoln 
Hollow Square (23), a form-board test for children which does not, however, 
make use of the bevelled edges which are the most individual feature of the 
Carl Hollow Square, although a similar device had been previously exploited, 
less systematically, in the Ferguson form-boards (9). 


MATERIAL AND BASIC PRINCIPLE 


The material consists of 29 blocks, of uniform thickness, and a ‘hollow 
square’—a thin wooden board out of which is cut a hole approximately 4$ 
inches square. 

55 


DIAGNOSTIC PERFORMANCE TESTS 


The test consists of a graded series of problems each of which requires the 
subject completely to fill the square with a given selection of blocks (usually 
four, but in two cases three). The blocks have edges which are either ‘straight’, 
ie. at right-angles to the surface, or ‘oblique’, i.e. bevelled at an angle of 
about 60°. The straight edges all fit against the edge of the square; the 


„ bevelled edges are always placed one against another so as to ‘run with’ each 


other, i.e. without leaving a V-shaped gap between blocks. 

There are 26 possible combinations of blocks that will fit the square and 
comply with these conditions, but since some are virtually duplications only 
19 are used in the test. These 19 combinations (or problems) are set out in 
order of presentation in Table 6 at the end of this Chapter. 

The blocks are identified by a letter and number code. All blocks of the 
same general shape carry the same letter, as follows: 


Group Shape Number of different 
blocks 
A ‘long’ rectangles B 
B ‘short’ rectangles 8 
(e triangles 4 
D ‘half-size’ triangles 1 
Е ‘quarter-size’ triangles 4 
F ‘long’ rectangles with one corner cut 3 
G ‘short’ rectangles with one corner cut 3 
H irregular ‘L-shaped’ 2 
I irregular 1 


As in the case of the other tests discussed, the following detailed instruc- 
tions simplify those in an earlier cyclostyled pamphlet. 


ADMINISTRATION: PRELIMINARY 


Carl (7) recommends a standardized form of instructions (albeit delivered 
‘in a natural tone’) and a rather complicated layout of the blocks. In our 
experience the former is definitely inadvisable, and a simpler if perhaps less 
methodical layout (see below) has proved adequate. It should, however, be 
stressed that since scoring depends both on time and on moves, and since it is 
important also to take as full notes as possible, the beginner may experience 
some difficulty in maintaining control of the situation in the early stages of his 
experience. It is advisable, therefore, to establish one’s own standardized 


56 


THE CARL HOLLOW SQUARE 


conditions, or to adopt the following arrangement (see plate facing p. 32), 
which has been found to be convenient. 

The subject should be seated on the opposite side from the examiner ofa 
not too broad table, with the ‘square’ placed between them. The blocks 
should be arranged in piles on the examiner’s left, the scoring sheet immedi- 
ately in front of him, and the stop-watch and schedule of presentations on his. 
right. The pieces may conveniently be piled as follows: 


A B B B 
1-3 1-3 4-6 7-8 
с D &E Е 
1-4 1 1-4 1-3 
с H I 
1-3 1-2 1 


The use of the scoring sheet is almost essential. Important features are: 


(i) Space in which to record moves by tally. It is altogether impracticable 
to attempt to count moves mentally. A mechanical counter, provided it can 
be operated silently, is a possible alternative, but it places greater demands 
on the examiner’s attention. 


(ii) A wide margin for ‘running commentary’. 


If the surface of the table is similar in colour to the test material it is 
advisable to place a sheet of plain white paper under the square. 


EXPLANATION OF PROBLEM 


Little difficulty will be experienced in explaining the nature of the task. 
Subjects of all ages and levels of intelligence are at least apparently able to 
comprehend the instructions; whether all the finer points are fully grasped 
is less certain at the outset, and it often happens that a point which the sub- 
ject claims to have understood appears to be ‘forgotten’ or neglected at a 
later stage of the performance when stress has developed. 

No set form of words need be used, and indeed, a subject of high intelli- 
gence, as Kent (18) points out, often shows signs of impatience to get started. 
In such a case it is permissible to tell him—on a man-to-man basis, as it were 
—that one is obliged as examiner to point out all the salient features. 


57 


DIAGNOSTIC PERFORMANCE TESTS 


The points to which attention must be called are as follows: 
(i) After it has been explained that the task is to fill the hole with the 
‚ pieces provided, it should be pointed out that the hole is square, and that 
this has something to do with the solution, but that one is not allowed to 
say in what way. This is a proviso specifically laid down by the author of the 
test; we have formed the impression that it is unnecessary to labour the 
point, since the information can be conveyed in another way (see next 
paragraph). 

(ii) The two kinds of edge, straight and bevelled, should be demonstrated 
using Block A1. Show how the long straight edge goes against an edge of the 
square; if it is placed in turn against two or more edges, this may serve to 
call attention to the fact that it is immaterial in what orientation the blocks 
are placed within the square. 

(iii) Next, using Al and АЗ, show how the bevelled edges fit together, 
and that the V-shaped join mentioned above does not constitute a ‘fit’. 
Finally, point out that although the straight edges may appear to ‘fit’ against 
one another, this is bound to be ‘wrong’, since the rules require that all 
straight edges should go against the edges of the square. 

(iv) It should be mentioned that there is a time-limit, which varies for each 
problem, but that the subject is not allowed to know what the limits are. The 
necessity for speed should not be stressed. If the subject asks about moves 
(which is unlikely) say that these will be counted, “ог interest'. In general, 
the subject should, if he asks, be advised to work at whatever tempo suits 
him best. 

(v) Carl further recommends that the subject should be told that the 
problems are graded in difficulty, but that they may appear to become easier 
as the principles are grasped. In the writer's opinion this is quite unnecessary 
and even inadvisable, since it may for instance arouse anxiety in a subject 
who finds difficulty with Problems 3 or 5, both of which are, in terms of 
time and moves allowance (see Tables 7 and 8) of more than average difficulty, 
although they appear early in the series. An exception may be made in the 
case of a subject who appears to consider the early problems too easy for 
him, or in that of a subject who shows extreme frustration and discourage- 
ment. 


ADMINISTRATION: GENERAL 


When it is clear that everything is understood, collect the pieces for the first 
problem and place them in a pile where they may be conveniently picked up 
by the subject. Say ‘Number One; Ready? Go!’—and start the stop-watch. 


58 


THE CARL HOLLOW SQUARE 


Count the moves in a tally of fives (criteria of what constitutes a move are 
dealt with in the appropriate section, pp. 61-62 below). 

When the subject has succeeded in filling the square, stop the watch and 
note the time, having first made sure that the solution is correct, in particular 
that no bevelled edge is overhanging an edge of the square. Tf an error has 
been overlooked, point this out at once, and allow the subject to correct it, 
up to the normal time-limit. No penalty is attached to such prompting. 

Put the blocks not required for the next problem back in their original 
piles, present those required for the next problem, and proceed as before. 

If a time-limit (see Table 7) is exceeded the subject may either be stopped 
(provided this is not done too abruptly), or, preferably, permitted to con- 
tinue until he has solved the problem. In either case time and moves are noted, 
and included in the totals (for use in calculating the ‘Rate correction’, see 
section on Scoring below). 

If the subject wants to give up a problem before the time-limit has been 
reached he should be given every encouragement to continue. If, on the other 
hand, he has obviously ceased trying, stop the watch, note time and moves as 
before, and take care to note the problem as a failure, scoring no points. 
The subject must not be allowed to sit idle waiting for the time-limit, as this 
would affect his rate of work and his final score, as described below. 

No ruling is offered as to the number of failures beyond which the test 
need not be continued. As already mentioned, the reputed progression in 
difficulty is not very obvious, and it would seem to be unsafe to assume that 
no further score would be made beyond any particular point in the series. 
It must be left to the examiner to decide when a given application ceases to 
be informative, but in any case a low degree of reliability must be attached 
to any score derived from less than the full series. 

Тһе subject's interest and participation must be allowed to be a deciding 
factor, and attention may be found to flag in face of the rather formidable 
length of the test series. For the same reason the examiner should guard 
against keeping the subject waiting while he performs operations which could 
well be postponed until after the test; it is not necessary, for example, to 
count the tallies after each problem. 


SCORING 


The score is based on a combination of credits for time and moves, with a 


final correction for rate of work. 4 . 
Each problem, irrespective of difficulty or position in the series, carries 
a basic score of 4 points. This basic score may be adjusted by the application 


59 


DIAGNOSTIC PERFORMANCE TESTS 


of a bonus or penalty in respect of (a) relatively short or long time taken, 
and (6) a relatively small or large number of moves. Schedules of bonuses 
and penalties appear as Tables 7 and 8 at the end of this chapter. 


A full account of the standardization of the scoring method, including the 
fixing of the limits for the various bonuses and penalties, is given by Carl. 
For the present purpose it will suffice to say that the time-limits were arbi- 
trarily set at points which cut off for each problem the lowest 10 per cent of 
the standardization group, and the various bonuses and penalties to cut off 
the percentages of the remainder of the standardization group set out in 
Table 5. 


Table 5 CARL HOLLOW SQUARE: SOURCE OF SCORES AND PENALTY 


VALUES 
Percentage of standardization population 

Bonus (+) ог Time Moves 
penalty (—) 

T2 highest 10 

+1 next 20 highest 25 

0 middle 40 middle 50 
--1 next 20 lowest 25 
-2 lowest, 10 


It will thus be seen that the range of possible scores for each problem 
solved within the time-limit extends from +7 to +1. A problem not solved 
within the time-limit of course scores 0. 

The best way to record scores is first to enter the Time bonuses and 
penalties in the appropriate column of the scoring sheet, by referring to the 
Time schedule. 

Next, add up the tally of moves for each problem, and enter the Moves 
bonuses or penalties in the same way, referring to the Moves schedule. 
Problems failed are best dealt with by drawing a line right through the bonus/ 
penalty and score spaces for the problem in question. Enter the score for 
each problem in the Score column by adding the algebraic sum of the two 
bonuses or penalties to the basic score (4) for each problem. Then obtain the 
total of the entries in the score column. 

A check may be obtained by obtaining the totals for the Time or Moves 
bonus/penalty columns and adding the algebraic sum of these totals to four 
times the number of problems solved. This should give the same result as the 
total of the score column. 

The final step is to apply the ‘Rate correction’. This is an allowance de- 
signed to offset the effect of a predominantly trial-and-error approach, on 


60 


THE CARL HOLLOW SQUARE 


the assumption that it may on occasion lead to ‘chance’ success. The reason- 
ing is that a ‘planful, analytic, "intelligent" solution of a problem often 
suffers, quantitatively, in comparison with a pell-mell, hit or miss, trial and 
error performance’. In other words, it is argued that an application of 
Moves bonuses or penalties may unjustly favour quick random movement 
and penalize the subject who pauses to think out what he is doing. 

A correction for ‘rate of work’ is therefore applied, in terms of which a 
small additional bonus is attached to any performance carried out at an all- 
over average tempo slower than the mean rate of the standardization group, 
and a penalty similar to any performance faster than the same mean rate. 

The rate correction is calculated as follows: Divide the total number of 
moves (М) by the total time taken, in minutes (Т). The result to the nearest 
whole number is then subtracted from the ‘standard’ (i.e. mean) rate, which 
may be taken аз 9 (actually found to be 8-9) moves per minute. This gives the 
Rate correction, expressible as 9-(M/T). 

This number is added to the total score as previously determined. It will be 
seen that rates slower than the standard rate thus increase the score; those 
faster than the standard decrease it. 

A discussion of the validity and discriminative value of the Rate correction 
will be found under Points for further investigation (p. 67 below). Users 
of the test may prefer to dispense with it and establish their own norms based 
on the uncorrected scores. The labour involved in calculating the correction 
would prima facie certainly appear to outweigh its usefulness. 

The correction must, however, be applied if use is to be made of Carl’s 
device for converting raw score into ‘I.Q.s’. For adult male subjects this is 
accomplished by adding 31 points to the raw score, and for adult female 
subjects 39 points. Mental ages, if desired, may be obtained from the І.О. 
using a basal age of 16. ‘Sub-adult’ 1.0.5 are also calculated by this method. 

The statistical method by which this relationship was established is not clear 
from Carl’s paper. He himself seems to have been surprised to find it; he 
writes: ‘Strangely enough the addition of one or the other of these constants 
(i.e. 31 for men; 39 for women) aligned the distribution of Hollow Square 
results fairly well with those of most of the other tests.’ 

In view of the above and since the Carl Hollow Square is not to be regarded - 
as a test of "intelligence", it is not recommended that the conversion be used, 
except as a quick means of placing a subject roughly in relation to the general 
population. Alternative norms will be found along with those for other 
tests in Chapter 7. 


COUNTING OF MOVES 


Criticizing the whole system of Moves scores, Kent (18) further claims that 
61 


DIAGNOSTIC PERFORMANCE TESTS 


the Moves count is bound to be unreliable, on the grounds that it is difficult 
to lay down satisfactory criteria of what should and what should not count 
as а move. This view is not confirmed in our experience. It has been found 
possible to train the learner to count moves reliably, and the criteria would 
appear to be logical and consistent. 

The principal criterion is that a move is counted whenever the edge of a 
block is placed in contact with an edge of the square, or with an edge of 
another piece, whether the second piece is already in the square or being held. 
in the hand. 

Alternatively, one may define as a move any placement or attempted place- 
ment of a block within the square or in relation to another block. Exceptions 
or apparent exceptions to this rule are as follows: 

(i) Accidental contact is not counted as a move. 

(ii) Moving a block merely to check whether the bevelled or straight edges 
are correctly placed does not count as a move, provided that it is put back 
immediately; the criterion here is that no other move should intervene. 

(iii) Turning a block over in the hand or holding it above the space to 
gauge whether it will fit does not count, so long as no contact is made. 

(It will be seen that these exceptions are indeed ‘apparent’). 

A move should be counted if a block is moved to a new position in the 
square, but taking a block out of the square does not of course in itself con- 
stitute a move, since the block will eventually have to be put back. 

If two or more blocks are fitted together in the hand or on the table outside 
the square, the minimum number of moves counted is one less than the 
number of blocks so fitted together. If they are then put into the square as a 
unit, one move is counted for this placing. 

The net result is that the minimum number of moves in which a problem 
can be solved is always identical with the number of blocks involved; i.e. 
one move must be counted for dealing correctly with each block, and any 
incorrect placement will be bound to add one to the Moves count. 


INTERPRETATION 


Very little indeed has been written about the Carl Hollow Square, and that 
mostly in relation to special abilities. Further, since our own experience of 
the test has not been in a clinical setting, patterns having interpretative 
significance can be indicated only tentatively and in a general way. Carl 
himself attached much importance to the potentialities of the test as a means 
of arriving at an understanding of the cognitive processes of the individual, 
particularly in relation to his educational history and general background. 


62 


THE CARL HOLLOW SQUARE 


In addition Carl writes as follows regarding the use of the test in the study 
of personality: ‘It goes without saying that the test lends itself to the making 
of qualitative observations concerning the subject, which in many instances 
are just as valuable, or more so, as the quantitative indices of performance. 
The individual’s reaction to the test as a whole, and his mannerisms in going 
about doing it, will often furnish valuable clues to variousaspects of personality 
or temperament. While there is probably not sufficient fineness in the discrimin- 
ation it affords between opposite poles of such characteristics to allow for 
categoric description or diagnosis, it certainly is true that watching an indi- 
vidual do the test gives at least some insight into some general phases of his 
mental make-up. The “go-getter” reacts very differently from the “timid 
soul”—even though the latter may realize a higher score. In certain psychia- 
tric senses the reaction patterns are illuminating as, for instance, in exposing 
retardation, depression, agitation, manic states, and so оп. The hesitant, the 
blindly impetuous, the sensibly deliberate types of reaction all add something 
to the clinical picture and tend to make it clearer. The one who “gives up” 
stands out clearly, as does the subject who bull-headedly persists in trying to 
do an exercise in a way which the more flexible individual would discard after 
he had tried it several times without success, To a degree many of the possible 
“qualitative” observations may be supported by the quantitative data for time, 
moves, and speed of movement; but with experience іп giving the test many 
times, to many kinds of subjects, the examiner will find splendid opportunities 
for adding to his understanding of the individual’s total personality, without 
regard to the actual score he realizes.’ 


CARL HOLLOW SQUARE PERFORMANCE AND LEARNING 


The feature of the Carl Hollow Square which principally commended it for 
inclusion in the Pemberley programme was the ‘learning’ element to which 
reference has already been made. Carl himself makes much of this point. 
The sequence of problems is designed to introduce gradually a number of 
‘principles’ of combination of the pieces, these being repeated in order to 
allow for ‘carry-over’ or ‘interruption’, in order to throw light on considera- 
tions of rigidity/fluidity in the subject's approach to the problems. There isa 
certain amount of 'suggestion' in the sequence of problems, i.e. when con- 
secutive problems contain some of the same (or similar) pieces it is sometimes 
difficult to resist the suggestion that a placement already proved ‘correct’ for 
a given problem will also be ‘correct’ for the next in the series. The most 
marked example of this will be found in working out Problems 3-5. 

The following schedule summarizes, in amplification of Carl's statement, 
the sequence in which the principles are introduced. 

63 


DIAGNOSTIC PERFORMANCE TESTS 


@ Problem 1 Basic ‘fitting’ of rectangular shapes. 


(ii) 2 Reconstruction of rectangular shape from pieces 
divided obliquely. 

(iii) 3-5 Alternative placements of small rectangles. 

(iv) 6 Joining of triangles to form a square and of F 
pieces at right angles to one another. 

(v) 7-8 Joining of triangles to form a larger triangle. 

(vi) 9 Reconstruction of large triangle from 'cut rect- 
angle’. 

(vii) 10 Recapitulation of stage (iv). 

(viii) 11-12 Recapitulation of and ‘resistance to’ stage (vi). 

(ix) 13-14 Introduction of irregular pieces (H and I). 

(x) 15-19 Recombination of principles already used. 


As will be seen, no new principle is introduced after Problem 14. Carl says: 
‘Thereafter the subject who has “learned” the various principles usually has 
little difficulty with the remaining exercises, in spite of the fact that they are 
far more difficult, intrinsically, than the earlier ones. . 

This is of course reflected in the more stringent bonus and penalty standards 
at this stage; it will be noted, in particular, that from Problem 14 onwards no 
error must be made if one is to earn the Moves bonus. 

No evidence is adduced for the ‘intrinsic’ difficulty of the later problems; 
indeed it is doubtful whether the relatively limited number of ways in which 
the large irregular pieces may be placed does not in fact make some of them 
easier. In any case, the capacity of the subject to learn is best assessed in 
relation to the sequence of scores, corrected as they are for time and moves 
in each instance. Few subjects, in our experience, are aware of the finer points 
in the grading of the series or of the stages at which new principles are in- 
troduced, 

It is of course possible to represent the sequence of scores graphically, and 
this was done as a matter of routine at Pemberley. The curves obtained, 
however, were seldom very informative at a casual inspection, since the trend 
of scores through the series of problems is often obscured by single aberrant 
instances. Some system of smoothing would seem to be necessary, but usually 
inspection of the figures is sufficient by itself to reveal any pronounced trend 
that may be present. Special consideration should be given to points at which 
lapses seem to occur or at which there is a change in level of performance, 
or of inflexion. A cause should always be sought, and is more likely to be 
found in the nature of the subject’s performance on adjacent problems than 
in the actual features of the problem itself. 


64 


Serre a = Др. 


THE CARL HOLLOW SQUARE 


OTHER FEATURES OF PERFORMANCE 


A group of idiosyncrasies in performance is related to failure to respond 
altogether adequately to the fact that the space to be filled is square. The 
implication of its squareness (see Explanation of Problem (i) р. 58) is of course 
that success in solving the problem should not, theoretically, be affected by 
the orientation in which the subject has chosen or chanced to begin building 
up the square. It is of course understandable, on the other hand, that all 
positions are not in fact equally easy; the horizontal-vertical illusion and the 
subject’s laterality may be cited as possible contributory factors. The 
direction of slope of the bevelled edges also exerts an influence: it is un- 
doubtedly easier to deal with a space that is not partly obscured by an over- 
hanging bevelled edge. It should not, therefore, be assumed that even an 
apparently purposeless alteration of a placement implies poor insight on 
the part of the subject. Often a ‘fresh slant’ on a problem is quite literally 
and deliberately being sought. Carried to excess, however, such behaviour 
must be regarded as at least mildly abnormal, and genuine cases of failure to 
recognize that the positions are relationally interchangeable occur. Рег- 
severative tendency will show itself in repeated reversion to the position 
initially tried. There is also a strong suggestion of compulsiveness in this 
behaviour, t 

Insecurity is shown in a tendency to abandon correct placements of the 
first two pieces without fully exploring the possibilities of placing the others. 

Complementary, in a sense, to this is the tendency to persist with what 
should, after a single trial, be recognized as obviously ап impossible beginning. 
(The commonest instance of this is repeated attempts in Problem 11 to fit 
F1 and F3 together at right angles.) Here again perseverative tendency may 
be shown, as of course in all instances where the problem requires a new 
relationship to be substituted for one already encountered. 

А final example of a response pattern which seems (0 imply imperfect 
grasp of instructions is persistent ignoring of the rules relating to the edges. 
Thus some subjects may be seen to place the first piece considered with a 
bevelled edge overhanging the ейде of the square, and, almost inevitably, a 
straight edge in or near the middle of the space. The simplest hypothesis to 
account for such behaviour is that the subject is unable to pay attention to all 
the requirements at once, i.e. he may be concerning himself exclusively with 
the problem of achieving a ‘fit’, and consequently ignoring the edges. As 
such it is probably analogous to comparable behaviour in Trist-Hargreaves, 
and may be interpreted accordingly. Nevertheless it sometimes seems to 
indicate something like the rather deep-rooted oppositional trend of which 
‘description’ as opposed to ‘story’ in T.A.T. is a manifestation (see Wyatt, 


5 65 


DIAGNOSTIC PERFORMANCE TESTS 


50). The rationale would be that the social stress of the test situation has 
motivated the subject’s ego-defences in a way which prevents him from free 
participation. 

Two rather more generalized methods of handling the material remain to 
be noticed: 

(i) The marked tendency, already mentioned in relation to T-M-K, to fit 
the pieces together outside the square, sometimes even to the extent of com- 
pleting the solution, and then lifting the assembled pieces carefully into 
position. 

(ii) A rarer but equally characteristic tendency to ‘feel’ the edges, usually 
by running the fingers along the bevelled edges or testing the right-angled 
corners of the straight edges. The temptation to equate such manipulation 
on a priori grounds with the preferred activity of the ‘haptic’ personality 
type (Lowenfeld, 25) is undeniably strong. Any such connexion, however, 
still awaits validation, and with it, of course, identification with the ‘intuitive’ 
approach to problem-solving. The rather more obvious indication that the 
subject who feels the edges is seeking reassurance is probably better founded. 

The foregoing discussion has been concerned chiefly with inadequacies in 
approach and performance. Unusually effective performance is of course 
equally interesting; and in our experience ‘good’ performances occur rather 
less frequently than Carl’s norms would seem to suggest. In other words, the 
discrepancies between the ‘I.Q.s’ obtained and the results of other standard 
tests too frequently suggest that the Carl Hollow Square underestimates the 
subject’s ability. The explanation may be in the nature of the population 
samples studied, or in faults in the original standardization. It is, however, 
interesting that subjects whose Carl Hollow Square performance ‘holds up 
with’ or surpasses their other test results fall rather clearly into three cate- 
gories: 

(a) Persons highly skilled in mechanical occupations and pursuits: those 
possessed with what is sometimes called ‘fitter’s eye’. 

(b) Subjects with a high degree of test-sophistication, who are not con- 
cerned to maintain prestige in relation to mechanical pursuits. 

(c) Efficient secretaries and other people whose work benefits from mild 
obsessional tendencies. 

With the possible and partial exception of the last group the common 
factor appears to be absence of neurotic tendency. At the same time it may 
be noted that Halstead and Slater (14) in a rehabilitation study found that a 
combination of previous experience and high scores on Carl Hollow Square 
gave the best prediction in selection for training in engineering occupations. 
The inference would appear to be that in this instance, at least, the test 
situation was not unduly anxiety-provoking. 


66 


THE CARL HOLLOW SQUARE 


POINTS FOR FURTHER INVESTIGATION 


The immediate need is for general validation of the interpretative indications 
discussed above, both in their own right and in relation to similar behaviour 
in other test situations. This latter point is put forward with reservations, 
since the Carl Hollow Square seems frequently to arouse a different pattern 
of attitudes from the other tests in this battery. 

If sufficient evidence that the test is a useful one is forthcoming it might 
be worth while to try to shorten it. The sequence of stages outlined on p. 64 
seems to have less empirical significance than the author attached to it, and 
some of the later problems, in particular, may be regarded as not discrimina- 
tive and consequently redundant. As the test stands, half an hour is а con- 
servative estimate of the average time required. Time might also be saved 
without detriment by shortening some of the time-limits. 

The cumbersome scoring system, which has attracted unfavourable com- 
ment, might also be revised, although such validation as it has been possible 
to carry out suggests that on statistical grounds the *Rate correction" stands 
up very well. Since the Pemberley data were not available, only the Garston 
data could be considered, and the highly selected nature of this population 
restricts the value of the findings. Nevertheless it is worth recording that each 
of the three score components showed a low correlation with Officer In- 
telligence Rating (O.I.R.) (see Chapter 7, p. 86), as follows: 


Time 0:13 Moves 0:27 Rate 0:14 


Time and Moves being assessed on the basis of bonuses and penalties. 
Rate inevitably showed a negative correlation with Time, but both the other 
intercorrelations were positive (and all were significant at the P = 0:01 level). 
Assuming linearity of regression in all cases, which is a little doubtful, 
multiple correlation with О.К. produces the following approximate weights: 


Time 2:0 Moves 3:3 Rate 3:7 


These figures also suggest а reconsideration of the weights assigned to the 
Time and Moves components in the method at large, but two points should 
be borne in mind: (1) that all data are derived on the basis of the current in- 
structions, which emphasize Time rather than Moves; and (2) that what one 
is doing in applying the test is assessing Carl Hollow Square performance 
and not predicting O.LR. or any other criterion of general intelligence. 
Further work with different criteria and a more representative sample would, 
therefore, appear to be necessary. 

67 


DIAGNOSTIC PERFORMANCE TESTS 


Table 6 CARL HOLLOW SQUARE: ORDER OF PRESENTATION 
OF PROBLEMS 


1 Al A2 A3 
2 Al A3 GI G2 
3 AI ВІ B2 B3 
4 Al B4 В5 B6 
5 А1 B1 B2 B4 
6 Al B7 СІ C2 
7 Di E1 E2 
8 E1 E2 E3 E4 
9 Е! Е2 СІ Fl 
10 СІ F1 c2 F2 
11 C1 F1 сз ЕЗ 
12 А! СІ ЕЗ GI 
13 А1 B7 G2 HI 
14 B7 HI Bl ІІ 
15 ЕТ F2 G2 H1 
16 A3 ВІ GI T 
17 A3 II! c3 G3 
18 B7 B8 HI H2 
19 B8 H1 C4 F2 
Table 7 CARL HOLLOW SQUARE: TIME SCORING SCHEDULE 
Problem Limit Bonus Penalty 
+2 +1 0 - 
1 200 -7 8-11 12- 30 31-1-04 
2 1-30 -10 11-15 16- 32 33- 56 
3 4-50 -19 20-37 38-1-33 1-34-2-35 
4 4-00 -25 26-38 39-1-37 1-38-2-44 
5 7-00 -23 24-45 46-2-29 2-30-4-29 
6 350 -21 22-33 34-1-22 1-23-2-23 
7 1-0 -І2 13-15 16- 29 30- 41 
8 200 -16 17-25 26- 52 53-1-23 
9 3-00 -22 23-36 37-1-24 1-25-2-18 
10 2-45 -І8 19-33 34-1-11 1-12-1-56 
11 4-0 -25 26-51 32-1-59 2-00-3-24 
12 3-0 -19 20-34 35-1-17 1-18-2-10 
13 6-00 -20 21-37 38-1-52 1-53-3-59 
14 2-45 -23 24-33 34-1-09 1-10-1-57 
15 2-5 -16 17-26 27- 54 55-1-24 
16 1-30 -17 18-23 24- 43 44-1-10 
17 2-15 -15 16-25 26- 52 53-1-14 
18 2-50 -І8 19-31 32-1-07 1-08-1-55 
19 2-30 -19 20-30 31- 55 56-1-41 


THE CARL HOLLOW SQUARE 


Table 8 CARL HOLLOW SQUARE: MOVES SCORING SCHEDULE 


Problem Bonus Penalty 
+1 0 - 

1 3 406 7 ог тоге 
2 4 5ог 6 7 ог more 
3 4106 7016 17 or more 
4 4106 7 ю 17 18 ог тоге 
5 4107 81022 23 or more 
6 40г5 6to 11 12 or more 
7 3 4or5 6 or more 
Ji 4 5108 9 or more 
я 4or5 6012 13 or more 
10 40г5 6 to 11 12 or more 
11 40г5 6 (0 15 16 ог тоге 
12 40г5 61010 11 or more 
13 4to6 7to15 16 or more 
14 4 509 10 or more 
15 4 5108 9 or more 
16 4 5107 8 or more 
17 4 5108 9 ог more 
18 4 509 10 ог тоге 
19 4 5109 10 or more 


69 


СНАРТЕВ 6 


The Revised Passalong Test 


The Passalong test was originally devised by Alexander (1) as a performance 
test suitable for children of from seven to fifteen years of age. The version 
presented in this manual was devised by Semeonoff as part of the Pemberley 
investigation with the object of making the test more suitable for adult sub- 
jects, for whom, it had been found, the original scoring system afforded in- 
sufficient discrimination. The revision consisted almost exclusively of 
evolving a scoring system similar to that used for the Carl Hollow Square, 
with certain modifications. Full details of the method of scoring will be 
found on pp. 74-75 below. 


NATURE OF THE PROBLEM AND MATERIAL 


The basic task is essentially one that requires the handling of spatial relation- 
ships. The main difference from conventional form-board tests is that 
the starting-point of each problem is defined as well as the end-position, so 
that the problem, as it were, unfolds itself in time. A fair degree of success 
may be achieved by trial and error, but with at least the more difficult prob- 
lems the subject has to work out rudimentary principles, and proceed through 
a series of sub-goals; planful behaviour is thus involved more directly than in 
any other test described herein. 

The material consists of red and blue blocks, approximately $ inch thick 
and 1 inch square (‘small’ blocks), 1 х 2 inches (long? blocks), or 2 inches 
square (‘large’ blocks). (The numbers of each kind required are noted in 
Table 9A.) These have to be moved about in ‘boxes’, which are shallow trays 
with a raised rim about 3 inch high. The sizes of the five boxes used are 
quoted in Table 9B to the nearest inch; in fact, they are a little bigger, enough 
to allow the blocks to be moved easily in directions parallel to the edges of 
the box, but not enough to allow any block to be moved ‘round a corner’, 
i.e, so as to involve a change in position through 90 degrees. 


70 


THE REVISED PASSALONG TEST 


Table 9A — РАЗЗАГОМО: SPECIFICATIONS OF 


BLOCKS 
Colour Size 
1xlin 1х2. 2х2іп. 
Red 2 1 1 
Blue 6 2 1 


Table 9B — PASSALONG: SPECIFICATIONS OF 
BOXES 


Box Size Problems 
A 2x2 in. 1 
B 3x2in. 2/3 
" c 3x4in. 4,5 
D 4x3in. 6,7,8,9,(10) 


Each box has one pair of opposite edges painted red and blue respectively; 
the other pair are unpainted. (In Table 9B the length quoted first is that of 
the painted edges.) | 

Each problem (with one exception) is presented with the blue blocks at the 
red end and one or more red blocks at the blue end. In all cases the task is to 
move the red block or blocks to the red end by pushing the blocks to and fro 
in the spaces not taken up by the other blocks. For each problem a card 
showing the ‘end-position’ is provided, and the general instruction is that the 
blocks must be left in the position shown on the card. 


SETTING UP PROBLEMS 


Each problem (except Problem 3) is set up by placing the blocks in the box. 
exactly as in the end-position diagram (see diagram, p. 73) taking out the red 
block(s), sliding the blue blocks to the red end, and replacing the red block(s) 
at the blue end. This ensures correct placing of the asymmetrical arrange- 
ment in Problem 8 (and the optional Problem 10, see p. 75, below). 
Problem 3 is set up as follows: using Box B (as for Problem 2), place two 
small blue blocks at the red end, and two small red blocks, with one blue 


71 


DIAGNOSTIC PERFORMANCE TESTS 


block between them, at the blue end. The end-position is exactly the same 
as for Problem 2. 


ADMINISTRATION 


It will be found that the nature of the task is easily grasped, even by subjects 
of low intelligence, and that a minimum of explanation will be necessary. 

Place Box A before the subject, with the red end away from him, and the 
diagram card where it can be easily seen while the subject is working on the 
problem. The red end of the box as depicted on the diagram should also be 
away from the subject. 

Explain that the blocks have to be moved about inside the box in such a 
way as to allow the red block to reach the red end, as shown in the diagram, 
and that the blue pieces must also be left as shown in the diagram. Demon- 
strate how the blocks can be moved by sliding the red block from side to 
side, Invite the subject to try for himself. In the unlikely event of his failing, or 
if he is unwilling to try, demonstrate the solution. Then again invite him to 
try for himself. Finally explain that this was a practice trial, and that ‘we 
shall now begin’. 

The subject should be told to work as quickly as possible; no mention 
should be made of moves. Further encouragement to work quickly is un- 
likely to be necessary, but may be given at the examiner’s discretion. 

When the subject has signified that he understands and is ready, Problem 2 

is presented. The performance is timed and the moves are counted, as in the 
Carl Hollow Square. One should not, however, attempt to keep a tally of the 
moves as the performance is likely to be too rapid to permit of this. 
* If the subject fails to solve a problem within the time-limit, allow him to 
continue until he has finished (if it is only a matter of time) or until he pauses 
of his own accord. In the latter case replace the blocks in the starting position, 
and tell the subject to watch carefully. Demonstrate the solution slowly, with 
pauses at appropriate points; always begin by moving the red block to the 
subject’s left. 

In general, the subject should not be allowed to give up before the time- 
limit is reached. If he spontaneously remarks that he is ‘stuck’, but feels he 
could do better if he could begin afresh, the pieces may be replaced in the 
starting position. In such a case one point should be deducted from the score 
for that problem (see Scoring, p. 74 below); time and moves expended up to 
this point should be included in the totals. 

If a time-limit is reached at a point when the subject has succeeded in 
bringing the red block(s) to the red end, but has not had time to rearrange the 
blue blocks as shown in the diagram, one point is allowed, 


72 


* 
т 


omg [ДЇЇ [эу A | :5м011504 ама 9wonvssva‏ ك 
“о 2 - Е. оо А aD‏ 


[| ШШІ | ШЇ 


DIAGNOSTIC PERFORMANCE TESTS 


If the subject leaves the blue pieces at the blue end incorrectly arranged this 
should be pointed out, and the subject allowed to continue until the normal 
time-limit, ifnecessary. If he succeeds, still within the time-limit, one point is 
deducted; if not, the situation reduces itself to that described in the previous 


paragraph. 
SCORING 


As already noted, the scoring system adopted is based on that of the Carl 
Hollow Square, with the following differences: 

(i) Each problem carries bonus score of 5 points instead of 4. Problem lis 
for practice only and is not scored, nor are time and moves recorded. 

(ii) Certain problems have bonuses for time and moves but have no (or 
limited) penalties. 

(iii) Certain deductions, already mentioned, are applied. 

As a result of these modifications the score for a successful solution varies 
between a maximum of 8 points and, in general, a minimum of 2, unless a 
deduction has been made. Discounting deductions, Problem 7 has a minimum 
score of 3 points and Problems 8 and 9 each a minimum of 5 points. These 
discrepancies may be justified on the ground that the later problems are 
sufficiently difficult to merit additional weight if they are solved at all. 
Historically they stem from the fact that in the standardization group they 
were solyed by such small proportions of subjects that the division of the 
successful performances into percentage groups as in the Carl Hollow Square 
(see p. 60) would have had to be based on such small numbers as to make the 
fixing limits extremely arbitrary and tentative. Consequently the percentage 
distribution determining the limits for bonuses and penalties was based on 
the whole sample, not the successful group, as in Carl Hollow Square. (See 
also Points for Further Investigation, below.) 

The system of deductions covers all possibilities except the hypothetical 
case of a subject who has had a fresh start and, having used a number of 
moves within the penalty range, requires a prompt for position of the blue 
pieces and fails to correct it in time. No such case has occurred in our experi- 
ence, but although rigid application of the deduction would result in a 
negative score, a zero score only would be recorded. 

Still following the precedent of the Carl Hollow Square, a Rate correction 
is applied to counteract trial-and-error success, etc. Owing to the much faster 
normal rate of work it is convenient to reckon this in moves per second, and 
a table of bonuses and penalties ready worked out (Table 12) is provided, 
along with the Time and Moves schedules, at the end of this chapter. It 
should be noted that the maximum penalty is set at —5; also that in dividing 


74 


THE REVISED PASSALONG TEST 


moves by time in seconds it is sufficient to calculate only to the first decimal 
place. 

Results for the first 42 subjects tested at Pemberley yielded a correlation of 
0-69 between the revised scoring and Alexander’s and, in general, higher 
correlations with the other tests in use. 


COUNTING MOVES 


Little difficulty should be experienced if one bears in mind the criterion of a 
move аз а single adjustment of the pieces. The following guidance may, 
however, be helpful. 

(i) Two more blocks moved simultaneously as a unit count a single move. 
On the rare occasions when two blocks are moved in contrary directions, one 
with each hand, two moves would be counted. 

(ii) If a block is moved round a corner in a single right-angled movement 
(as opposed to two distinct pushes) only one move is counted. 

(іі) An incipient move, small enough not to occupy the adjacent square 
inch, and cancelled, is not counted. 

(iv) ‘Centring’ the red block(s) at the end of the problem is not counted 
as a move. (Similarly no penalty is incurred if the red block is left uncentred.) 


ADDITIONAL PROBLEM (No. 10) 


Reference has already been made to an additional problem (No. 10) which, 
it was suggested, may be administered following the test proper as the time 
permits, and if the subject has been successful with Problems 8 and 9, It was 
further suggested that a note of time and moves be kept, with a view to in- 
corporating the items in a later standardization of the test. No subject, how- 
ever, in our experience has solved this problem in a time comparable with. 
those set for the test problems proper, and it would appear that no useful 
purpose would be served by making it a regular part of the test. 

Readers may, however, care to try it out for themselves. It is set up by 
placing the four small blue blocks used in Problems 6-9 all together at the 
left of Box D and the two long blocks together at the right, in the same 
direction as in the previous problems, i.e. at right angles to the painted 
edges. 

Further comment on Problem 10 will be found in the section on inter- 
pretation p. 77 below. 


75 


DIAGNOSTIC PERFORMANCE TESTS 
INTERPRETATION 


As in the case of the Carl Hollow Square, the principal qualitative feature of 
the Passalong Test is the opportunity it gives for studying the subject’s learn- 
ing capacity. This forms the main topic of the present section, but two other 
points more specific to this test may be disposed of briefly first. 

The first is based on the fact that success by trial and error is more readily 
attained in Passalong than in any other test known to us. A trial-and- 
error approach, however, prevents the emergence of insight into the principles 
of which at least rudimentary awareness is necessary to solve the later prob- 
lems expeditiously and without undue expenditure of moves. Subjects of 
high ability quickly discover all these things for themselves, and careful 
observation of the way in which the subject strikes a balance between a plan- 
ful approach and one which seems to aim at speed exclusively will throw light 
on the subject’s effectiveness in situations which сап be regarded as possessing 
the same general features as those embodied in the test. The point at which 
trial and error is adopted or abandoned is also of interest. Thus, in the face of 
difficulty a stable subject of high ability is more likely to study the possi- 
bilities more carefully; an unstable subject or one of low ability is, on the 
contrary, more liable to resort to random movement in comparable circum- 
stances. 

The second feature depends on the ‘enforced’ help provided for in the 
Instructions, which require that the solution of a problem on which the sub- 
ject is unsuccessful shall be demonstrated. Contrasted attitudes are revealed in 
(a) the way in which the demonstration is received, e.g. over-submissively, 
resentfully, or with genuine interest, and (b) the degree of attention paid. 

Varieties of response under these two headings are of course interrelated. 
Thus, a characteristic oppositional pattern often observed is that of the 
extrapunitive subject whose self-esteem is wounded by failure in what may 
appear to be a puerile task, and who, in his anxiety to redeem himself, pays 
insufficient attention to the demonstration. 

Consideration of the learning elements in the test must take into account 
the structure underlying the succession of the problems in the series. The 
following phases may be distinguished: 


(i) | Problems 1-2 Solution by simple rotation (i.e. regular movement 
round the box in a clockwise—or anti-clockwise— 


direction). 

(ii) 3 Introduction of irregular movement. 

(iii) 4 Introduction of ‘right-angled move’ i.e. moving one 
small block or pair of blocks to ‘make a path for’ 
the red block. 


76 


.THE REVISED PASSALONG TEST 


(iv) 5-6 Reinforcement of above principle; introduction of 
large pieces reduces scope for trial and error. 

(v) 7 Reversal of previous trend: added scope for trial and 
error through substituting two long blocks for one 
large one. 

(vi) 8-9 Use of sub-goal introduced: i.e. rearrangement of 


blue blocks before red can usefully be moved at all. 


Alexander himself makes no direct references to these actual stages as 
such, but that they form a logically developing sequence becomes very clear 
if one compares each problem with those that precede and follow it. (А 
problem rather different in general layout from all the others was originally 
interposed between Problems 6 and 7, as a *habit-breaker', but was discarded 
because ‘it became clear that it was serving no useful purpose’.) 

As the series is now constituted it presents (unlike the Carl Hollow Square) 
a clearly discernible progression in difficulty. The only point, however, at 
which the subject is likely to become aware of a plan underlying the sequence 
is in Problems 7-8-9. The essence of the solution of Problem 8 is to reduce 
the arrangement of the blocks to that of Problem 7. When this has been 
achieved the red block can be moved to the red end by the familiar process 
of ‘rotation’. In Problem 9 the position of the blue blocks must similarly be 
reduced to that of Problem 8, but here there is the added complication that 
the red block must be moved into a fresh position while the ‘other half” is 
being dealt with. The concept of a sub-goal is in this way introduced. with 
Problem 8, but it becomes more obvious in Problem 9. 

Another way of regarding these problems is іп the formulation of the 
principle that from Problem 7 onwards the key is to keep the two long pieces 
together, with a pair of small ones on each side of them. Since this situation 
already exists in Problem 7, the main break in continuity of difficulty should 
theoretically occur after Problem 7; experience, however, clearly shows that 
it comes after Problem 8. Problem 10 introduces a further new principle 
quite unlike anything that has occurred in the earlier problems—that one 
small block must be isolated from the others as an initial step towards the 
solution, This appears to be so unexpected to a subject who has attempted to 
analyse the problems, that the reproduction of the sequence of moves in 
reverse necessary to bring the blue blocks back to the starting position 
becomes very difficult indeed—and in fact usually still defeats us. Interesting 
considerations regarding the effect of apparent relevance of sub-goals arise, 
but these are more pertinent to learning theory than to interpretation of 


Passalong test performance as such. 


77 


DIAGNOSTIC PERFORMANCE TESTS 


POINTS FOR FURTHER INVESTIGATION 


Restandardization on a larger group would appear to be necessary, and might 
lead to filling in the gaps in the schedules of Time and Moves bonuses and 
penalties, perhaps on the basis of revised time-limits. The present system 
uses Alexander’s original time-limits: these were adopted in order to allow 
test performances to be scored on Alexander’s system until norms became 
available. 

Certain of the points relating to the scoring of the Carl Hollow Square 
would bear investigation for Passalong as well. In particular, the applicability 
to Passalong of the Carl Hollow Square scoring system, which was assumed, 
remains to be demonstrated. The absence of full ranges of bonuses and pen- 
alties to which attention has already been drawn (see p. 74) made it impossible 
to carry out a calculation similar to that for Carl Hollow Square reported on 
р. 67, and the absence of the full original records precludes calculation from 
actual time and moves totals. Such work as it has been possible to do suggests 
that, in contrast to Carl Hollow Square, the chief importance lies in the time 
element, at least for the easier problems. For the later problems it would 
appear that number of moves assumes greater importance indeed, the superior 
discrimination, for adults, of the present scoring system, as compared with 
Alexander's, would seem to lie in the moves component, although the all- 
over correlation with Pemberley Intelligence Grading (I.G.) (see p. 86) is а 
good deal lower (:16) than the corresponding correlation of Carl Hollow 
Square moves with О.К. (see p. 67). There is as yet no evidence that the 
‘Rate correction’ serves any useful function. Nevertheless, users of the tests 
are recommended to continue counting moves and calculating the correction, 
tedious though this may be, until further data have been collected. 


Table 10 PASSALONG: TIME SCORING SCHEDULE 
Bonus Penalt; 
Problem Limit +2 +1 0 — E: —2 
i шы бес. sec. sec. sec. sec. 
2 2 0-6 ОЕ ЕНЕРІ 
3 3 0-4 5- 6 7- 9 10- 13 14-180 
4 3 0-12 13- 16 17- 30 31- 50 51-180 
5 3 0-15 16- 20 21- 60 61-150 151-180 
6 3 0-15 16- 22 23- 47 48-120 121-120 
1 3 0-14 15- 19 20- 50 51-180 "m 
8 4 0-35 36-150 151-240 = = 
9 5 0-90 91-150 151-300 Е = 


THE REVISED PASSALONG TEST 


Table 11 PASSALONG: MOVES SCORING SCHEDULE 


Problem Bonus Penalty 
+1 0 - 
1 зе = > 
2 10-12 13-14 15+ 
3 8 n 9-10 114 
4 14-17 18-26 27+ 
5 14-17 18-49 50+ 
6 14-18 19-34 35+ 
bi 14-23 24-39 40+ 
8 14-79 80+ m 
9 23-99 100+ - 
Table 12 PASSALONG: SPEED CORRECTION 
Moves per ло 220 50 40 50 60 70 80 90 1:00 1-10 
весопа —19 —29 —39 —49 -59 -69 —79 —89 —99 —109 + 
Вопив ог 7 
Penalty x5 +4 43 42 cM 0 Аа Өл fa ыс. 


79 


CHAPTER 7 


Norms 


_ —MM——— 


The norms presented in this chapter cover, in addition to the performance 
tests, the various written and individual tests along with which they were 
developed, and to which reference was made in the Preface. Certain of these 
tests, as there explained, form the subject matter of appendices in this 
manual, Information regarding the others is contained in the Summary of 
Tests appended to this chapter (pp. 94-96). 

For the performance tests the principal source from which the norms have 
been derived is the Pemberley assessment programme. Since that work had 
of necessity to be carried out along lines dictated by operational requirements, 
sampling could not be done in such a way as to facilitate standardization in 
terms of a distribution approximating to that of the general population. 
Discrimination within the ‘candidate population’ was the main objective, but 
comparison with general population standards was also possible (to a limited 
extent) in respect of scores on tests for which independent norms existed. 
These tests were as follows: 

(a) Written tests 

Both ‘officer’ and ‘general’ Services standardizations of all versions of 
Progressive Matrices either were or eventually became available. Although 
general and officer standardization of the Reasoning test also existed, only 
officer norms were available at Pemberley, and only for the original (English) 
version. Translated versions had of course to be standardized separately, 
and adequate numbers were accumulated for two languages—French and 
Polish. 

(b) Performance tests 

(i) T-M-K. General population norms had been derived from both naval and 
army sources, and there was also an Army Officer standardization for series 
A-F (i.e. including the two ‘reversed colour’ series). None of these studies, 
however, included the Repeat. 


(ii) Carl Hollow Square. The author's own standardization was of course 
available. 


80 


NORMS 


(ін) Passalong. The author's standardization using the original scoring sys- 
tem was available. 

Additional data on the performance tests were later obtained at the 
‘experimental’ W.O.S.B. (Garston), where, in the summer of 1945, candidates 
were tested with either T-M-K (A-F) and Semeonoff-Vigotsky or Carl Hollow 
Square and Trist-Hargreaves. In this programme nearly all candidates were 
also tested with the Shortened Wechsler Verbal Scale (see Appendix 1), as well 
as with the normal written test battery. 

Finally, the Semeonoff-Vigotsky test was re-standardized by Laird (22, 
reported by Semeonoff and Laird, 41) on a fairly representative general 
population sample. 

All tests used at Pemberley were at first assessed according to the method 
at that time in use in the Army’s Directorate for the Selection of Personnel 
(D.S.P.), i.e. in terms of a six-point ‘Selection Grade’ (S.G.) scale designed to 
distribute the scores into percentage groups as shown in Table 13, below. (In 
this and comparable tables—16, 18, 39—quoted elsewhere, successive entries 
in the ‘percentage’ column should be read as ‘highest 10 per cent’, ‘next 20 
per cent’, etc. of the population in question’). 


| Table 13 PERCENTAGE DISTRIBUTION OF 
THE SELECTION GRADE SCALE 


Percentage 
Grade of population 
1 10 
a 
3+ 
3m 3) 
4 20 
10 


Where ‘general population’ norms were available (i.e. for 1938 Matrices, 
T-M-K, and Carl Hollow Square) the Pemberley percentage grades were 
based on these. ‘Candidate population’ norms were gradually evolved for 
the tests in use for the first time; all versions of Reasoning had to be included 
among these, and the level of the candidate sample was expected (rightly, as 
it turned out) to approximate more closely to that of a general population 
than to that of an officer candidate sample. 

Summation of grades as heterogeneous as these, with a view to conversion 
into an all-over intelligence grading on a point scale is а dubious procedure, 
particularly if it is desired to interpret the all-over gradings in relation to a 
known population distribution. Nevertheless, it was felt that meaningful 


a 81 


DIAGNOSTIC PERFORMANCE TESTS 


discrimination was achieved, in terms of a sum of admittedly arbitrarily 
weighted components. 

Percentage grades uniformly based on candidate samples for all tests were 
later substituted, but a more radically improved method of assessment be- 
came available following the adoption by the War Office Selection Boards of 
the system of ‘standard equivalent scores’. The system was introduced simul- 
taneously at Pemberley, but making use exclusively of data derived from the 
candidate population, without any reference to existing norms, from what- 
ever source. Since the various tests had been administered to different sub- 
samples, there was room for error if equivalence were assumed for the means 
and variances of all samples. However, discrepancies were found to be 
negligible within the limits of accuracy required for comparison and com- 
bination of scores, except in the case of Semeonoff-Vigotsky, which had, 
latterly at least, been regarded as ‘difficult’, and consequently was adminis- 
tered principally to candidates in the upper ranges of scores on the written 
tests: allowance for this was made in calculating the standard equivalent 
scores. 

The scale of equivalent scores adopted was based on an assigned mean of 
30 points and an assigned standard deviation of 5 points. Since nearly all the 
distributions were negatively skewed, discrimination is imperfect in the upper 
portions of the scale. Nevertheless the table of equivalent scores as used at 
Pemberley is reproduced here, (Table 14), modified in the case of the Sem- 
eonoff-Vigotsky by the substitution of figures based on the proposed new 
scoring (see Semeonoff and Laird, 41). It must, however, be emphasized that 
these tables were drawn up for use in a specific context, i.e. for discrimination 
within a population sample of known characteristics. It is possible to use the 
table as a basis of comparison of scores obtained by a given individual on the 
various tests covered. Assessment of such scores in relation to general 
population standards is also possible by reference to the conversion table, 
which translates the sum of three equivalent scores to the nine-point all-over 
intelligence scale intended to conform to general population percentage 
groups. This, however, is not recommended, and reference should rather be 
made to the broadly grouped, estimated general population percentile norms 
contained in Table 14. 

These are offered as tentative norms, reasonably reliable within the degree 
of accuracy implied by the percentiles quoted. In addition to the Pemberley 
data, use has been made, in drawing up these norms, of results obtained at 
Garston (No. 14 W.O.S.B.) in respect of T-M-K, Trist-Hargreaves, and Carl 
Hollow Square. The Semeonoff-Vigotsky figures are based primarily on a 
rescoring of Laird’s results. All these norms must be regarded as least reliable 
at the lower end of the scale, since the lowest intelligence levels were poorly 


82 


NORMS 


represented in all the samples studied. Thus, for example, the Naval recruit 
norms for T-M-K suggest somewhat lower figures throughout, and particu- 
larly below the 20th percentile, but since this sample was believed to be 
biased in the opposite direction we haye preferred to make relatively little 
use of these figures. It is hoped, in general, that verification or—more prob- 
ably—modification of the norms will be possible following the reporting of 
further applications of the tests, but we would again call attention (as rele- 
vant in this context) to our view that the psychometric aspect of these tech- 
niques is ultimately secondary in importance to their diagnostic value. 


Table 14 PERFORMANCE TESTS: PERCENTILE NORMS 
Carl 
Semeonoff- Trist- T-M-K Hollow Revised 


Percentile Vigotsky Hargreaves T-M-K Repeat Square Passalong Percentile 


95 36 78 72 75 96 45 95 
90 46 73 67 72 91 42 90 
80 60 68 60 67 84 39 80 
75 67 65 57 64 82 38 15 
70 72 63 55 61 81 37 70 
60 81 58 49 56 75 34 60 
50 89 54 45 53 70 32 50 
40 96 51 41 50 67 31 40 
30 103 48 38 47 65 28 30 
25 106 45 35 45 63 27 25 
20 110 43 30 43 61 26 20 
133 39 24 36 55 23 10 

5 (150) 31 18 34 50 21 5 


Similar tables of equivalent scores had been derived at R.T.C. for tests in 
regular use at W.O.S.B.s. Two such sets of tables were available, and are 
reproduced herein, as Table 17, and Table 38 (in Appendix IT). Table 17 covers 
the standard W.O.S.B. battery of written intelligence tests, together with other 
tests used as ‘confirmatory’ tests or in special circumstances. Table 38, which 
was a later development, gives figures derived from an independent standard- 
ization of those tests for which separate norms for different age groups were 
required for use in the re-allocation project described in Appendix IL. 

It should be noted that while all equivalent scores were scaled to a standard 
deviation of 5 points, the assigned means differed, and were as follows: 


Standard W.O.S.B. battery 33:3 points 

Pemberley battery 30 points 

*Re-allocation' battery 40 points 
83 


DIAGNOSTIC PERFORMANCE TESTS 


Table 15 EQUIVALENT SCORES: PEMBERLEY BATTERY (MEAN 30) 
Matrices Reasoning } 
Equivalent 1938 1943 Coding | Equivalent 
score 45- 20- English French score 
minute minute 
46 40 46 
45 39 346- 45 
44 37-38 337-345 44 
43 60 36 40 327-336 43 
42 59 38 35 39 318-326 42 
41 57-58 36-37 33-34 37-38 309-317 41 
40 56 35 32 35-36 300-308 40 
39 59-60 54-55 34 30-31 34 290-299 39 
38 58 53 32-33 29 32-33 281-289 38 
37 57 51-52 31 27-28 30-31 272-280 37 
36 55-56 50 30 26 29 263-271 36 
35 54 48—49 28-29 24-25 27-28 253-262 35 
34 52-53 47 27 23 25-26 244-252 34 
33 51 45-46 26 21-22 23-24 235-243 33 
32 50 44 24-25 20 22 225-234 32 
31 48-49 42-43 23 18-19 20-21 216-224 31 
30 47 41 21-22 17 18-19 207-215 30 
29 45-46 39-40 20 15-16 17 198-206 29 
28 44 38 19 14 15-16 189-197 28 
27 43 36-37 17-18 12-13 13-14 179-188 27 
26 41-42 . 35 16 11 12 170-178 26 
„25 40 33-34 15 9-10 10-11 161-169 25 
24 39 32 13-14 8 8-9 151-160 24 
23 37-38 30-31 12 7 6-7 142-150 23 
= 22 36 29 H 5-6 5 133-141 22 
21 34-35 27-28 9-10 4 3-4 123-132 21 
20 33 26 8 2-3 1-2 114-122 20 
19 31-32 24-25 6-7 1 105-113 19 
18 30 23 5 94-104 18 
17 29 21-22 4 85- 93 17 
16 27-28 20 2-3 = 84 16 
15 26 18-19 1 А 15 
14 24-25 17 14 
13 23 15-16 13 
12 22 14 12 
n 20-21 12-13 11 
10 19 10-11 10 
9 17-18 9 9 
8 16 8 8 
7 15 6-7 7 
6 13-14 5 6 
5 12 5 
4 10-11 4 
a ———— UB ee ere ا‎ || REM 
84 


a (7 фу NORMS 


Table 15 EQUIVALENT SCORES: PEMBERLEY BATTERY (MEAN 30) 
шіуа- Саг Equiva- 
lent Trist- Hollow \Passalong lent 
score Hargreaves| Square score 
46 46 
45 116 & over 45 
44 112-115 44 
43 110-111 43 
42 107-109 41 
41 104-106 42 
40 101-103 | 48 & оуег 40 
39 80 98-100 | 46-47 39 
38 78-79 96-97 45 38 
37 15-71 93-95 37 
36 73-74 90-92 42-43 36 
35 70-72 87-89 4l 35 
34 67-69 84-86 34 
33 82-83 38-39 33 
32%, 62-64 79-81 3 32 
31 60-61 76-78 36 31 
30 57-59 74-75 34-35 30 
29 55-56 71-73 33 29 
28 52-54 68-70 2 28 
27 50-51 66-67 30-31 27 
26 47-49 63-65 9 26 
25 60-62 28 25 я 
24 42-43 57-59 26-27 24 
23 39-41 54-56 5 23 
22 37-38 52-53 24 22 
21 49-51 22-23 21 
20 32-33 46-48 21 20 
19 29-31 43-45 19 
18 27-28 40-42 18-19 18 
17 24-: 38-39 1 17 
16 21-23 35-37 1 16 
15 19-20 32-М 14-15 15 
14 16-18 29-31 13 - 14 
13 14-15 26-28 1 13 
12 11-13 24-25 10-11 12 
11 9-10 21-23 n 
10 6-8 18-20 8 10 
9 4-5 15-17 6-7 9 
8 -3 -14 5 8 
7 7 
85 


“.. 


DIAGNOSTIC PERFORMANCE TESTS 


The apparently unwieldy figure used for the standard battery was chosen so 
as to allow a sum of three equivalent scores (S.E.S.3) to range round a mean 
of 100. This sum was used as the basis for conversion to the Officer Intelli- 
gence Rating (ОЛ.К.), ап 11-point percentile scale, distributed approximately 
normally for the serving officer population (see Table 18). This structure was 
chosen in order to conform to the familiar idea of ‘marks out of 10’, with 
the position of the zero (corresponding to general population average) such 
as to give meaning to the concept of ‘zero officer intelligence’. At Pemberley 
the intelligence grading (I.G.) was a 9-point S.E.S.3 conversion (see Table 16) 
ranging from 9 down to 1, rather than to 0, since it was felt that in this con- 
text the imputation of ‘zero intelligence’ might occasionally give offence. 
Different assigned means were chosen for the Pemberley and Reallocation 
batteries in order to discourage any attempt to make direct comparisons 
between scores obtained under different conditions and assessed in relation 
to different populations. The original figures are reproduced here, for 
similar reasons, and in order to avoid any confusion that might have resulted 
following rescaling, if reference were being made to existing documents. 


Table 16 PEMBERLEY BATTERY: CONVERSION OF SUM OF THREE EQUIVALENT SCORES 
ТО INTELLIGENCE GRADING 


Sum of 3 Intelligence Theoretical Obtained 
ы Equivalent Grading Percentage of Percentage е 
Scores General Population Pemberley Population 

107- 9 5 10 
101-106 8 10 16 
96-100 1 10 20 
91-95 6 15 13 
84-90 5 20 18 
80-83 4 15 11 
75-19 3 10 6 
63-74 2 10 4 
-62 1 5 2 


In some tables certain figures have been obtained by simple extrapolation. 
In instances where the limits of scores actually obtained are known to the 
authors, extrapolated figures are in italic. It will be noticed that in certain 
columns of the tables the entry *. . . and over’ occurs; i.e. extrapolation has 
not been carried beyond an arbitrarily imposed 'ceiling' of 45 points (about 
2:33 a above the assigned mean). This was done to avoid undue weighting of 
exceptionally high scores on certain tests as compared with tests in which 
*possible' scores did not, because of skewing, reach the equivalent of 45 points. 

_ 86 


| 


І 


Table 17 


EQUIVALENT SCORES: ARMY OFFICER POPULATION (MEAN 33.3) 
(a) STANDARD WRITTEN BATTERY (GROUP TESTS) 


33 32 11 36-38 6 
31-32 31 10 33-35 5 
30 29-30 9 30-32 4 
29 28 8 27-29 3 
28 27 7 24-26 2 
26-27 26 6 22-23 1 
25 25 5 20-21 
24 24 4 17-19 
23 22-23 3 14-16 
21-22 21 2 11-13 
20 20 1 8-10 
-19 -19 0 - 


NORMS 


DIAGNOSTIC PERFORMANCE TESTS 


EQUIVALENT SCORES; ARMY OFFICER POPULATION (MEAN 33:3) 


Table 17 (b) CONFIRMATORY TESTS (INDIVIDUAL) 
Equivalent Shortened Wechsler Trist-Misselbrook-Kohs Equivalent. 
Score Com- Simil- Vocab- Whole, A-D E-F A-F Score 
pre- ari- ulary 
hension ties 
45 23 41-42 79 апа 48 45 
and over over 
44 20 40 77-78 45-47 121-124 44 
43 22 39 75-76 43-44 117-120 43 
42 19 21 38 73-74 76 41-42 112-116 42 
41 18 37 72 73-75 39-40 108-111 41 
40 20 36 70-71 71-72 36-38 104-107 40 
39 17 19 35 68-69 68-70 34-35 99-103 39 
38 34 66-67 65-67 32-33 95-98 38 
37 16,2 18, :33. 65 63-64 30-31 90-94 37 
36 17 32 63-64 60-62. 28-29 86-89 36 
35 15 31 61-62 57-59 25-27 81-85 35 
34 16 30 59-60 55-56 23-24 77-80 34 
33 14 15 29 57-58 52-54 21-22 73-76 33 
32 28 56 49-51 19-20 68-72 32 
31 13 14 27 54-55 47-48 16-18 64-67 31 
30 13 26 52-53 44-46 14-15 59-63 30 
29 12 25 50-51 41-43 12-13 55-58 29 
28 12 24 9 38-40 10-11 51-54 28 
27 И п 23 47-48 36-37 8-9 46-50 27 
26 22 45-46 33-35 6-7 42-45 26 
25 10 10 21 43-44 30-32 3-5 37-41 25 
24 9 20 41-42 28-29 1-2 33-36 24 
23 9 19 40 25-27 0 28-32 23 
22 8 8 18 38-39 22-24 24-27 22 
21 7 17 3637 20-21 19-23 21 
20 7 16 34-35 17-19 15-18 20 
19 6 15 32-33 15-16 11-14 19 
18 6 5 14 31 11-13 6-10 18 
17 13 29-30 9-10 2-5 17 
16 5 4 12 27-28 6-8 1 16 
15 3 11 25-26 3-5 15 
14 4 10 24 1-2 14 
13 27779: 22-23 0 13 
12 1 8 2021 12 
11 7 18-19 11 
10 2 0 6 16-17 10 
9 SUIS 9 
8 1 -4 — i 8 


88 


EQUIVALENT SCORES: ARMY OFFICER POPULATION (MEAN 33:3) 


NORMS 


Table 17 (с) VOCABULARY TESTS (GROUP) 

Equivalent Mill Hill Vocabulary: Set A Mitchell Equivalent 
Score Definitions Synonyms Whole Vocabulary Score 
45 58 and 66 and 122 and 49 45 

over over over 
44 56-57 64 118-121 48 44 
43 54-55 62 115-117 47 43 
42 52-53 60 111-114 46 42 
41 51 58 108-110 44-45 41 
40 49-50 104-107 43 40 
39 47-48 56 101-103 42 39 
38 45-46 54 97-100 41 38 
37 43-44 52 93-96 40 37 
36 41-42 50 90-92 38-39 36 
35 39-40 48 86-89 37 35 
34 37-28 46 83-85 36 34 
33 35-36 44 79-82 35 33 
32 33-34 42 76-78 33-34 32 
31 31-32 40 72-75 32 31 
30 30 38 69-71 31 30 
29 28-29 65-68 30 29 
28 26-27 36 61-64 25-29 28 
27 24-25 34 58-60 27 27 
26 22-23 32 54-57 26 26 
25 20-21 30 51-53 25 25 
24 18-19 28 47-50 23-24 24 
23 16-17 26 44-46 22 23 
22 14-15 24 40-43 21 22 
21 12-13 22 36-39 20 21 
20 11 20 33-35 19 20 
19 9-10 18 29-32 17-18 19 
18 7-8 16 26-28 16 18 
17 5-6 22-25 15 17 
16 3-4 14 19-21 14 16 
15 1-2 12 15-18 12-13 15 
14 10 12-14 11 14 
13 8 8-11 10 13 
12 2-6 4-7 9 12 
11 1-3 7-8 11 
10 6 10 


89 


DIAGNOSTIC PERFORMANCE TESTS 


EQUIVALENT SCORES: ARMY OFFICER POPULATION (MEAN 33:3) 


Table 17 (d) MISCELLANEOUS TESTS (GROUP) 

Equivalent МАР. Canadian Canadian Equivalent 
Score Group Test 33 Figure analogies Classification Score 
45 188-193 45 
44 183-187 60 89-90 44 
43 177-182 58-59 87-88 43 
42 172-176 57 83-86 42 
41 166-171 55-56 80-82 41 
40 160-165 53-54 76-79 40 
39 155-159 51-52 73-75 39 
38 149-154 49-50 69-72 38 
37 144-148 48 66-68 37 
36 138-143 46-47 62-65 36 
35 133-137 44-45 59-61 35 
34 127-132 42-43 56-58 34 
33 122-126 41 52-55 33 
32 116-121 39-40 49-51 32 
31 110-115 37-38 45-48 31 
30 105-109 36 42-44 30 
29 99-144 34-35 38-41 29 
28 94-98 32-33 35-37 28 
27 88-93 30-31 31-34 27 
26 83-87 28-29 28-30 26 
25 77-82 27 24-27 25 
24 72-76 25-26 21-23 24 
23 66-71 23-24 18-20 23 
22 60-65 21-22 14-17 22 
21 55-59 20 11-13 21 
20 49-54 18-19 8-10 20 
19 16-17 4-7 19 
18 38-43 14-15 1-3 18 
17 33-37 13 0 17 
16 27-32 11-12 16 
15 22-26 9-10 15 
14 16-21 7-8 14 
13 11-15 5-6 13 
12 -10 0-4 12 


NORMS 


Table 18 CONVERSION OF SUMMED EQUIVALENT SCORE (S.E.S. 3) TO OFFICER 
INTELLIGENCE RATING 


Officer General рорша= 


S.E.S. 3 O.I.R. Verbal description percentage tion percentage 
123- 10 Of outstanding intellectual 5 1 
ability 
117-122 9 Definitely above the level of the 10 
average serving oflicer | 
10 
110-116 8 Slightly above the level of the 20 
average serving officer 
101-109 7 As intelligent as the average 30 
93-100 6 serving officer 20 J 
87-92 5 Slightly below the level of the 10 1 
average serving officer 20 
81-86 4 ^ Above the level of the average 4 J 
private but not up to the 
level of the average serving 
75-80 3 officer 1 | 
73-74 2 Not appreciably more intelli- 0 = 
71-72 1 gent than the average private 0 J 
-70 0 Less intelligent than the ауег- 0 50 
age private 


In an attempt to provide figures free from such effects of skewing, tables of 
T-scores (McCall, 26) have been derived from the Garston test results. These 
are given in Table 19. The same caution as recommended in relation to the 
tables of equivalent scores should be observed in the use of these. 

Certain additional points should also be borne in mind in using the Garston 
tables. This sample is of course not at all representative of the general 
population, but it is based on one of the most homogeneous officer candidate 
samplesstudied. All candidates were under 20 years of age," and type of school 
attended and educational standard attained were carefully balanced for each 
intake. While it must be regarded as entirely typical only of an officer candi- 
date population it may nevertheless be found useful as providing norms for 
samples of young men entering upon university education or occupations 
requiring similar or only slightly lower levels of ability. 

1 The majority were in fact 18 on their last birthday; of 288 candidates tested 13 were 
17+ and 38 19+. 

91 


os LL ES £9 19-68 15 92-54. os 
iss 8L vs v9 85-95 с vc LL 6c IS 
TS 6L 55 59 56-55 86 8L ws 
єс 08 95 99 25-05 єє 6$ St 6L oe ES 
bs 28-18 LS 19 6-2? vC 09 08 43 
ss £8 85 69-89 22444 19 9c 18 ss 
95 78 65 OL 57 St 9 c8 ТЕ 95 
15 58 09 IL 0738 9c £9 Lc £8 LS 
8e L8-98 29-19 €L-ZL L£-9€ Lc v8 85 
65 88 79-9 vL SEE 8c #9 87 58 TE 65 
09 68 99-59 SL Е-Е 05—65 6c 98 09 
I9 06 L9 9L 18-65 I£ 59 18 tt 19 
(9 16 89 85 t£-CE 99 88 c9 
59 6-96 69 9-52 vt [us 68 59 
#9 56-%6 OL LL yc-tc St L9 06 vt v9 
59 96 IL TUIT LE-9E 89 ТЕ 59 
99 L6 tL 02-61 ЗЕ 16 99 
19 86 £L SI-LI 6t 69 L9 
89 66 8L 91-51 оғ 26 SE 89 
69 vL ТІСЕІ Ir 0L ct t6 69 
OL 001 6L (41 [44 IL 6 OL 
IL 101 SL TIOL 24. $6 9t IL 
TL TOI 6 ty tL 96 cL 
£L £01 08 8-І. Я bL 16 £L 
vL РОТ 9L 95 SL єє 86 vL 
SL SOI УЕ 9L 66 LE SL 
91, ст LL 9L 
L 0 Sy vt LL 
8L 8L 8L 
6L SE oor 8t 6L 
2umnbs. 
$ морон а-ғ $2AD2434DH 510814 saapnbs 4215122 Зишшоѕрәу “LTA ЕРІ L 
1292 3-WL -ISHI -Доноәшәбс pauarioy$ $221IDJAT 


VIVG МОІЅҸҰО :S3HOOS-L 


6I QLL 


92 


9 9 bad от Ic 
т SE L It [44 
£c 9c 8 [4! ET 
Ұс L 6 sp £I vc 
H4 Lc I St 
9c 9v 8с от 9r st 9c 
LT Ly 6c 8 9t Lv Lc 
8c 87 144 18-08 Р и 8b 91 85 
6c 05—67 ET £t-Ct 6 LE 6v LI 6c 
05 25-15 yc SEE or 8t e 15-0$ 0t 
I£ 123813 92-62 16-9 6t zs 81 ТЕ 
ct 95-55 82-17 бЕ-8Е и or £I $5 61 ct 
tt 15 05—65 Іғ-0ғ 061-211 ІР SSS oc £t 
vt 85 ЕЕ [44 911-201 [4! [44 I 95 vt 
SE 65 123533 tr 901-001 tr SI Ls 1 St 
9t 09 96-5: tr 101-26 £I 144 91 85 [44 9t 
Lt I9 SELE Sp 96-76 9-5? LI 09-65 Lt 
8€ c9 0t-6t 9r 16-88 ?І Ly 19 £c 8t 
6£ £9 Trlr 8-і? 18-58 87 81 (9 6t 
or v9 Ер 192 T8-t8 sI 6r v9-£9. Ұс or 
Ip 59 142 15-05 28-08 os 61 $9 Iv 
tr 99 Sy TS 6L-8L 9r 15 19-99 st Tr 
ЕР 89-19 9v sts LL-SL LI TS oc 89 tr 
ҮР 69 Ly 55 >. ES 69 tv 
sp OL 87 95 02 81 23 Ic OL 9c Sv 
9 CL-IL 6r Ls 69-89 61 55 IL 9r 
Ly FL-£L os 65-85 19-99 [44 24. Lc Ly 
8h SL 15 19-09 S9-y9 02 95 tL 8p 
6? 9L ws (9 £9-c9 Ic £c vL 8c [24 
asonby 
57; моон q-F $24024840H —— (510814 sasonby 4215422 М Suluosvay ‘LTA ЕРІ Ж 
109 WNL “ISHL -оиоэшәс раизлоц' SOILD 


утуа NOISNVO :59Ҹ005-1. 


61 21981. 


93 


DIAGNOSTIC PERFORMANCE TESTS 


Finally, we have appended, for use with children, a table (Table 20) of age 
norms (median scores only) for two performance tests and 1938 Matrices, 
untimed. The latter are from one of Raven's own papers (36), but it may be 
remarked that a later series of Progressive Matrices specifically designed for 
children is now available (Raven, 37). The norms for T-M-K are a com- 
promise between somewhat conflicting figures derived from various sources; 
those for Carl Hollow Square are adapted from the original author's data. 
Italicized figures are to be regarded as signifying arbitrary Mental Age only, 
for use in calculating quotients, if desired. 


Table 20 AGE NORMS: MATRICES, T-M-K, AND CARL HOLLOW SQUARE 
Age 1938 Matrices T-M-K Carl Hollow Square 
(years) (untimed) (A-D) Male Female 

8 18 19 11 

9 23 6 25 17 

10 . 29 10 31 23 

11 34 15 37 29 

12 38 20 44 36 

13 42 25 50 42 

14 43 31 56 48 

15 43 37 62 54 

16 43 69 61 

17 51 75 67 

18 58 81 73 

19 64 87 79 
20 69 94 86 


SUMMARY OF TESTS 


The following synopsis summarizes information on the tests covered in 
the various tables of Equivalent and T-scores (Tables 15, 17, 19). 


I. Table 15; Pemberley Battery 


Progressive Matrices and Reasoning are dealt with in the discussion of Table 
17, below, since they formed part of the basic W.O.S.B. battery emanating 
from R.T.C. 

Coding. A letter-digit substitution test, rather similar to the Wechsler sub-test 
ofthe same name, introduced experimentally with the intention of providing a 
speed test relevant to the skill required of a radio operator; in use for a limited 
time only, but also used, more extensively, in ancillary projects. Based on a 
class-room experiment described by Collins and Drever (8), it consisted of 
twelve half-minute work periods on a 12-item code, separated by half-minute 


94 


NORMS 


rest periods. The score was the total number of substitutions during the entire 
period, no check being made on errors, Cognizance could, if desired, be 
taken of the amount done in successive work periods, and of the presence or 
absence of an end-spurt, induced by the information that the twelfth work- 
period was the last. 


The Performance tests are of course fully described in Chapters 2-6. 


П. Table 17: W.O.S.B. Battery 
(a) Standard written battery 


Progressive Matrices. A non-verbal ‘g’ test, consisting of 60 multiple-choice 
items, arranged in five graded series; originally intended for untimed adminis- 
tration, to which a 45-minute time-limit is virtually equivalent. Use of a 20- 
minute time-limit, later introduced, makes little difference in the nature of 
the task, but discrepancies in individual cases have diagnostic significance 
(see Appendix II), The 1943 version (unpublished) was designed to be more 
difficult and therefore better suited to discriminate within an officer popula- 
tion. It contains 38 items, not arranged in separate series, and so not strictly 
‘progressive’. Comment on apparent differences between the three versions 
will be found in Chapter 8, p. 109. 


Verbal Intelligence Test (V.I.T.). An omnibus test, consisting chiefly of verbal 
multiple-choice questions, but also including a few ‘open-ended’ numerical 
items (mainly number series). Time limit 20 minutes for 98 questions, several 
being in two or more parts; speed consequently an important factor. Along 
with the following test, V.I.T. is still in use and therefore restricted. 


Reasoning. An ‘abstractions’ test of the type due originally to Shipley (43). 
The nature of the problems is in general similar to those of V.LT., but there 
is greater variety and instructions are minimal, the subject having to educe 
the form of answer required from the nature of the pattern material (see 
examples quoted in Appendix IV). Forty items graded fairly steeply in 
difficulty; time-limit 20 minutes. 


(b) and (c) ‘Confirmatory’ tests (Individual) and Vocabulary tests (Group). 
Reference has already been made (above) to 7-M-K; the other tests in these 
two categories are dealt with fully in the appendices. 


(d) Miscellaneous Tests (Group) Аа 
National Institute of Industrial Psychology Group Test 33. This widely-used 


95 


DIAGNOSTIC PERFORMANCE TESTS 


test, predominantly verbal in character, comprises five sub-tests, separately 
timed. Total time is 29 minutes, but may be reduced by 10 minutes if the fifth, 
most difficult, sub-test (18 items only) is omitted; it has been shown that this 
omission is quite negligible in its effect. Total number of items 193. The test 
was used as an alternative. 


Canadian Figure Analogies. A non-verbal test, adapting the ‘analogies’ 
principle to material of the Progressive Matrices type; 60 items, time-limit 
35 minutes. This and the following test were used at Regular Commission 
Appeal Boards. 


Canadian Army Classification Test (Advanced form). An omnibus test of 90 
mixed multiple-choice and open-ended items, closely similar to V.I.T., but 
with a longer time-limit (45 minutes) and consequently less emphasis on 
speed, Considerably more than usual amount of practice material. 


Ш. Table 19. Garston T-scores 

The only test not covered by the two preceding tables is Squares. A spatial 
test of N.LLP. origin, having also a fairly high ‘g’ content (see Chapter 8, 
р. 111). Consists of 50 items, in each of which a figure has to be divided by a 
straight line in such a way that the two portions might be reassembled (after 
turning) to form a square. Not generally used in officer selection, but in- 
cluded in the Garston programme for research purposes. 


SUMMARY OF NORMS 


The following synopsis of norms quoted in this book, their sources, the tests 
covered, and the uses to which they may be put will be found useful. 


I. (Table 14). Percentile norms, intended to apply to the general population, 
but possibly too exacting in the lower ranges; derived from all sources. Per- 
formance tests only, i.e. Semeonoff-Vigotsky, Trist-Hargreaves, T-M-K 
(A-D), with separate norms for the ‘Repeat’, Carl Hollow Square, and Re- 
vised Passalong. 


Il. (Tables 15 and 16). Standard Equivalent Scores, scaled to a mean of 30 
and S.D. of 5, the mean being that of the Pemberley population, which 
corresponds very roughly to the 70th percentile of the general population. 
May be used as a means of comparing performances of the same individual 
on two or more tests. Conversion table (Table 16) assesses the sum of three 


96 


NORMS 


standard scores in relation to a 9-point percentage distribution of the general 
population. Performance on a single test may be assessed very approximately 
by multiplying its standard equivalent score by 3 and referring to the con- 
version table. Tests covered are the performance tests, as above; three 
versions of Progressive Matrices; English and French versions of Reasoning; 
and ‘Coding’. 


Ill. (Tables 17 and 18). Standard Equivalent Scores, as above but scaled to a 
mean of 33-3, the mean being where possible that of the serving officer 
population, otherwise (less important tests only) that a of W.O.S.B. candidate 
population. Table 18 provides for conversion (as in II, above) to an 11-point 
distribution of the officer population, and an asymmetrical percentage distri- 
bution of the general population. Tests covered are: Progressive Matrices (as 
above), Reasoning, V.I.T.; a shortened form of Wechsler-Bellevue consisting 
of 3 sub-tests, assessed separately and together; Mill Hill Vocabulary, as a 
whole and the two sub-tests separately; a new Vocabulary test; М.1.1.Р. 
Group Test 33; two Canadian Army tests; of the performance tests only 
T-M-K (A-F). ‹ 


IV. (Table 19). T-scores, i.e. normalized standard scores scaled to a mean of 
50 and S.D. of 10. Mean is.that of a superior W.O.S.B. candidate population, 
comparable with University entrants or other young men of like educational 
attainments. Lower ranges possibly unreliable, but less so than in the case 
of II and III above, where most of the distributions were negatively skewed. 
"Tests covered include all performance tests except Passalong and Repeat of 
T-M-K; Progressive Matrices (1943 only); Reasoning; V.LT.; shortened 
Wechsler (а-а whole); Squares. 


V. (Tables 38 and 39). Standard Equivalent Scores, mean 40, for both variants 
of 1938 Progressive Matrices for 6 age-groups, and for the two sub-tests of 
Mill Hill Vocabulary for 4 age-groups. Conversion table as in III, but for 
the sum of these 4 components. Derived from officer sources. May be used to 
‘correct’ for age, or to assess effects of ageing in the individual. 


VI. (Table 20). Median Mental Age Norms, from various sources, for Т-М-К. 


(A-D), Carl Hollow Square, and Progressive Matrices, (1938), untimed. For 
use with children, or for assessing superior ability in terms of quotients. 


97 


СНАРТЕК 8 


Statistical Data 


The arrangement of this Chapter is as follows: 

A. Correlations of the performance tests with one another, and with the 
other tests along with which they were developed. 

В. Factor studies, 

C. A discussion of the use and interchangeability of the performance 
tests as components of a global ‘Intelligence Grading’. 


_ 


As with the normative data contained in Chapter 7, the correlational and 
factorial data here reported are derived from two main sources—the Pember- 
ley assessment procedure and the Garston reliability experiment. Since the 
candidate samples were not comparable, and since even within either sample 
not all candidates were given the same battery of tests, correlation coefficients 
between any pair of tests naturally show considerable variation. In a limited 
number of cases it would have been possible to pool subjects from both 
sources and in this way to obtain correlations based on much larger numbers. 
This we have preferred not to do, since there were marked differences in the 
conditions under which the tests were administered, and in motivation and 
other attitudes on the part of the candidates, Since it would be impossible to 
predict how such differences would affect test performances, either in general 
or in relation to a specific individual, it has been considered better to discuss 
discrepant results (where these occur), rather than to attempt to arrive at an 
all-over value. 


A. CORRELATIONS 


Correlations are available not only between the tests which form the main 
topic of this manual, but also with other well-known or restricted or original 
tests of intelligence or of special abilities. Since something is known of the 
factor content of most of the tests they are referred to in this chapter, for 
convenience, as ‘criterion tests’. b 

98 


— 


STATISTICAL DATA 


The criterion tests used in the two procedures under discussion overlapped 
in respect of two pivotal tests; other tests were used in only one context, and 
in some cases only during part of the time. 


Common to both procedures were: 


(i) Matrices. There was, however, the important difference that whereas at 
Garston the 1943 version was in use throughout, at Pemberley three different 
versions (1938 version with 45-minute time-limit; the same with 20-minute 
time-limit; the 1943 version) were in use at different times. With minor 
exceptions the sequence was: 45-minute, 1943, 20-minute, this last being the 
version used during the greater part of the procedure. 


* 

(ii) Reasoning. Here again there is a divergence to be noted; whereas at 
Garston all candidates were of course tested with the original (English) form, 
English was the preferred language of only a relatively small proportion of 
the Pemberley candidates, and by far the greater number were tested with a 
French version. Substantial numbers also took a Polish version, and somewhat 
fewer a Dutch version. Versions in Danish, German, Hungarian, and Czech 
were given to only small numbers of candidates, and no results from these 
have been used in this chapter (though some are quoted in Table 22). A fuller 
account of these translated versions will be found in Appendix IV. 

Additional criterion tests in use at Pemberley included the following: 

*Coding'. See Chapter 7, pp. 94-95. 

‘Meccano’. A mechanical assembly test devised by E. Anstey, on lines 
similar to the A.T.S. “Мес” test (S.P. Test 24, see Vernon and Parry, 46). 
The test material consists of four simple Meccano models, two of which are 
‘working’ models involving mechanisms. In the version іп use at Pemberley 
the parts required for each model were laid out on a table. Candidates were 
allowed to examine a completed example of the first model for as long as 
they liked; the model was then taken away and the candidates were required 
to assemble the parts within a stated time-limit. The same procedure was 
repeated for the remaining models. Since the test (in а slightly different form) 
proved to be of low validity as a test of mechanical aptitude it was not 
brought into use in D.S.P., at Pemberley it was retained primarily for the 
opportunity it gave of observing candidates handling fairly small-scale 
material, Scores on the Meccano test have been included in this study because 
their correlations and the factorial content of the test are of some help in 
determining the nature of the tests principally under discussion. 

Other tests in use at Pemberley for which full data exist include Morse 
Aptitude (S.P. Test 10) and various tests of visual memory and ‘observation’, 

99 


DIAGNOSTIC PERFORMANCE TESTS 


but their relevance to the present purpose is only marginal—although since all 
have personality implications they might yet терау study by casting light on 
the significance of behaviour in the various test situations. 

At Garston the following criterion tests were applied: 

The Verbal Intelligence Test (УЛ.Т.: S.P. 15). See p. 95. Vernon and Parry 
describe it as the ‘best’ test of the W.O.S.B. intelligence battery. 

The Shortened Wechsler Verbal Scale, whichforms the subject of Appendix]. 

‘Squares’ (S.P. Test 4). See p. 96. 4 

Іп addition, a number of Information tests were applied, but as in the case 
of the Pemberley tests mentioned above, and for the same reasons, no use 
has so far been made of the results. , 

Limitations оп the number of cases оп which correlations have been based 
were imposed by the following circumstances: 


(i) At Pemberley 
Since the Trist-Hargreaves was throughout regarded primarily as an ‘easier 
alternative’ to Semeonoff-Vigotsky, only a very small number did both, and 
the correlation quoted has only been obtained by including a number of female 
candidates, whose test scores have nowhere else been used in this study. 
Rather similarly, the Passalong was introduced as an alternative to Carl 
Hollow Square, and no cases at all are available of candidates who did both. 
Certain other correlations are absent as a result of the sta ges in the programme 
at which various tests were introduced or dropped. 
(ii) At Garston 
Although performance tests were not an integral part of the programme, 
arrangements were made to have all candidates tested with one of two pairs 
of performance tests (see Chapter 7, p. 81). Since T-M-K was also used in its 
‘confirmatory’ capacity, a few candidates had this test in addition to their 
normal assignment. Unfortunately, however, numbers were not large enough 
to allow for correlations with. Trist-Hargreaves or Carl Hollow Square. 

In the following pages correlations from the two Sources are presented in 


the first instance separately. Later, (p.105), comparable correlations are 
reassembled for comparison. 


I. Pemberley Data 

Since, as has already been mentioned, the Pemberley testing was carried out 
under operational conditions, correlations here quoted are derived from 
sub-samples which differ widely in respect both of size and of range. All raw 
correlations have been collected in Table 21, which shows also the number of 
cases on which each is based. Correlations printed in ordinary type are 
significant at the P — 0-01 level; those significant at the P — 0-05 level are 


100 


8ТАТІ8ТІСАІ. РАТА 


Table 21 PEMBERLEY TESTS: UNCORRECTED CORRELATIONS 


Trist- Carl 
UT Cod- Semeonof-  Har- T-M-K T-M-K Hollow Passa- Meccano 


45” 193 іш Vigotsky  greaves (Repeat) Square long 
Reasoning. « 4 
English (16) 76 72 (-03) 55 56 55 61 (30) = (18) 
10 31 14 18 20 m 58 56 39 a 
French 12 76 76 48 46 57 66 68 36 (03) 51 
оз әз 60 81 85 $1 — 907 207 ін — т эв 
Polish 52 69 — = - (00) 58 53 32 (43) (12) 
110066 a3 88 B8 [1] 20 16 
Dutch 69 56 - - (47) 60 55 57 59 - 57 
п в 15 23 40 40 34 1 
" — - 52 67 55 T 51 
D E it зт 97 a7 a4 
= - 79 67 09) — 51 
Gera $ O ege a a 2 % 22 
1 — 74 65 — 78 
Hungarian ATAS A 1 % 18 11 n 
00) 27 28 21 - 25 
Coding a И (00) ТА 110 “ na 
; 24) 41 42 38 - 21 
Senol- n (De Qo dm 18 10% 161 
34 41 20 (07) 37 
у) ТЫР 5 161 101 Yo 24 141 
85 57 54 49 
TAE iW {а % 465 330 — w 303 
48 47 
T-M-K 64 62. 52 62 
(Repeat) 107 9268 98 380 4n $02 
Carl Hollow 35 36 34 ма 
Square вз 163 85 
Passalong. -,4 — 2 
48 
Meccano 30 50 56 
106 98 98 
Decimal points omitted 


Blanks indicate less than 10 cases available. 
* Group includes 13 women subjects (see text). 
t Correlations based on equivalent scores for all types of Matrices pooled; numbers for each type as shown. 


101 


DIAGNOSTIC PERFORMANCE THSTS 


Vigotsky Hargreaves Square along 
Matrices. 
RM -" E 4% Lu aM - 
4 Lud ж жы ээ! ҹи 
m ча» зм хө 357 — 
Е чп жа жо Eu зы - 
5 542 ЕЛ! D 2% — 
=e ma 604 47 эм 4% 
Coding ow am an лм = 
Mans aoe o» чи 423 но aM ма 
Table 2 MOSLEY тта: DNTIRCONIGILATIONS Of. PERFORMANCE TESTA, CONSICTED. 
FOR BIXTRACTION OF BOR 
Beef me TMK TMK Сө! Paw 
Ирену Hello гім 
Square 
Vigoti y чи, EL чи м) — 
чи" 42 НЫ ім 
a Ж 52 s 3 X 
эн м 3 «mn =“ 
= 34 39 ч - 
“Зе mota to Talde 27 


Я 
STATISTICAL DATA 


Not included in the above correlations are data from a sample of 156 cases, 
all of whom took an identical battery of tests. This sample, hereinafter re- 
ferred to аз the "X sample’ has been treated separately in order to provide а 
reliability check, and as the basis of one of the factor analyses (See p. 108). 
Correlations for this sample are shown in Tubie 24. 


fee 
Hi 
| 
| 
| 
j 
fi 


т! 
XE 
E 
| 
Е 

| 
| 


Амалын = 74; for those involving В tests т = 105. 


DIAGNOSTIC PERFORMANCE TESTS 


Table 26 GARSTON TESTS! CORRELATIONS, CORRECTED FOR RESTRICTION OF RANGE, OF 
PERFORMANCE TESTS WITH CRITERION TESTS, AND WITH ONE ANOTHER 
Semeonoff- Trist- T-M-K Carl 
Vigotsky Hargreaves (А-Е) Hollow Square 
рии ee ee ee ee eee eae ee س‎ 
Matrices (1943) 242 *248 :505 282 
УЛТ, "015 258 "382 +096 
Reasoning 242 "284 :535 247 
Wechsler "300 :194 231 -035 
va -143 133 "532 357 
T-M-K (A-F) ап 
Carl Hollow Square “138 


It must be emphasized that these correlations are based on very restricted 
populations: the variance on the standard W.O.S.B. tests for this group is 
considerably smaller than in a normal officer candidate population, the 
standard deviations ranging between 3-87 and 4-32 points of equivalent score. 

Uncorrected correlations between criterion tests for the two sub-groups 
(i.e. those candidates who took the A and B tests, respectively) are shown in 
Table 27. Although not strictly relevant to the present purpose these figures 
are included here in order to call attention to the similarity between the two 
groups in respect of all the tests except Matrices, on which the performance of 
the B group appears to have been in some way anomalous. This anomaly has 
of course affected to a still greater degree the uncorrected figures for the total 
group shown in Table 25. These figures are derived from Mitchell's study (see 
Appendix Ш), in which the total number (269) exceeds Groups A and В to- 
gether, since a proportion of candidates were not given the complete indi- 
vidual testing. 


Table 27 GARSTON TESTS: UNCORRECTED CORRELATIONS BETWEEN CRITERION TESTS: 
GROUPS A AND B, AND WHOLE SAMPLE 
Group A Group B Whole sample 
Matrices with УЛТ. | 459 (173) 393 
Reasoning * 443 *621 
Wechsler *355 (:000) 271 
Squares "318 (112) 280 
ИЛЛ. with Reasoning "625 602 +694 
Wechsler 454 433 521 
Squares +299 “207 314 
Reasoning with Wechsler 450 -429 -540 
Squares *353 *281 -365 
Wechsler with Squares (227) “251 :258 


STATISTICAL DATA 


Ш. Comparison of figures from both sources 

This comparison is concerned with correlations of the performance tests with 
the criterion tests, and of the performance tests with one another. Included 
are data from the main Pemberley population, the Pemberley ‘X’ sample, and, 
the Garston sample combining Groups A and B. All correlations quoted (in 
Table 38, below) are ‘corrected’ correlations, as described on pp. 102 and 103. 


Table 38 COMPARISON OF CORRELATIONS FROM PEMBERLEY AND GARSTON POPULATIONS 


Pemberley Garston , 
Main sample — 'X" sample 
Semeonoff-Vigotsky 
with Matrices (1943) 293 254 
Reasoning (English) +565 242 
T-M-K 419 Mt 
Trist-Hargreaves 
with Matrices (20°) 57 “490 
ga 429 "248 
Reasoning (English) ‚ 342 "284 
(French) "608 528 
T-M-K 357 452 
Carl Hollow Square 213 "359 138 
Т-М-К 
with Matrices (207) “697 512 
(1943) 538 505 
Reasoning (English) 551 +535 
French "697 542 
Carl Hollow Square S81 ,585 
Carl Hollow Square 
with Matrices (207) "391 "397 
(1943) Д1 28 
Reasoning (English) Y i 
(French) M4 284 


Agreement between the two sets of Pemberley figures may be regarded as 
satisfactory, although discrepancies admittedly do occur. Y 

Comparing these with the Garston sample, it will be seen that in five cases 
out of nine the Garston correlations are substantially lower than those from 
Pemberley. It should be noted, however, that in all these cases but one the 
Pemberley figures are based on small groups, and consequently less reliable; 
on the other hand the Garston population is of course highly selected for 
intelligence. For correlations of T-M-K with the written tests the agreement 
is excellent, in spite of the fact that the Garston results are for series A-F, and 
those from Pemberley for series A-D. 

105 


DIAGNOSTIC PERFORMANCE TESTS 


On the whole, considering the differences between the two populations, the 
discrepancies are perhaps not too serious. It may be remarked, however, that 
whereas the variance of the scores in the written tests is much greater in the 
,Pemberley than in the Garston population, the difference is generally much 
less in respect of the performance tests. This may be interpreted as meaning 
that the performance tests show poor discrimination or poor validity, or 
both, but it is not surprising in view of their susceptibility to the influence of 
attitudinal and other personality variables. 


Summary on correlations 

"The following table (Table 29) summarizes mean weighted correlations 
(calculated using Fisher's z-conversion) for performance tests with combina- 
tions of all versions of Matrices and of Reasoning respectively (Pemberley 
data only). Passalong is not included, since total numbers were small, 


Table 29 MEAN CORRELATIONS OF PERFORMANCE TESTS WITH 
POOLED VERSIONS OF MATRICES AND REASONING 
Matrices Reasoning 

r n r n 
Semeonoff-Vigotsky эз 165 41 148 
Trist-Hargreaves *50 298 52 312 
T-M-K “62 619 +59 613 
Т-М-К (Repeat) "59 619 +60 613 
Carl Hollow Square 40 487 35 485 


These figures may be still more concisely expressed by saying that correla- 
tions with the written tests are: for T-M-K (either application) of the order 
of 0-6, for Trist-Hargreaves, 0-5, and for Semeonoff-Vigotsky and Carl 
Hollow Square, 0:4 and 0:35, the higher coefficient in each case attaching 
to the pair of tests with or without a recognized space perception constituent. 

These correlations would be a little higher for a general population, but 
even as they stand they give a good all-over picture of the extent to which the 
performance tests may be expected to yield results similar to those of con- 
ventional intelligence tests. 


B. FACTORIAL CONTENT OF THE TESTS 


Vernon (45) quotes factor loadings for certain of the criterion tests, and for 
‘Kohs Blocks’ (i.e. T-M-K including series Е-Е). The following table (Table 
32) summarizes the principal loadings (Vernon’s ‘sub-factors’, for which he 


106 


STATISTICAL DATA 


does not, in any case, quote loadings, are not included), It will be noted that 
Reasoning is missing from this table, but the test listed as ‘Abstraction’ 
(Naval Group Test 1) is closely similar, consisting of 20 items of a generally 


lower level of difficulty. ‚ 


Table 30 VERNON'S LOADINGS FOR PRINCIPAL TESTS 

Loadings 
Test g k:m vied 
Ди 
Matrices (20) 79 415 
VET 72 99 
Abstraction "80 20 
Squares 55 46 
Kohs' Blocks 50 54 


Factorizations of our own data diverge to some extent from these findings," 
but it must be emphasized that conditions of sampling prevent us from 
placing a great deal of reliance on our results where they differ from those 
quoted above. 


Pemberley factorizations 

Four separate small-scale analyses on the Pemberley data were carried out. 
One or other version of Matrices and of Reasoning were included in each, 
together with T.M.K and its Repeat. The following table summarizes the 
tests included in each battery. 3 


Table 31 TESTS INCLUDED IN PEMBERLEY ANALYSES 
Analysis Matrices Reasoning Form-board Conceptual Other 
test test test 
1 20” Егепсһ Carl Hollow — Tríst-Hargreaves — 
Square 
2 20” French Carl Hollow — Semeonoff- - 
Square Vigotsky 
3 20° all Passalong Trist-Hargreaves | -- 
4 1943 French Carl Hollow - Coding 
Square 


Illu VP ат e Mm 


The choice of these batteries was mainly determined by the following 
considerations: 
(1) It was desired to include as the basis of each such tests as were common 
1 principally, perhaps, because we have not assumed simple structure. 
107 


DIAGNOSTIC PERFORMANCE TESTS 


to all candidates: for this purpose Matrices, Reasoning, and T-M-K were 
obvious choices, particularly since something was known of their factor con- 
tent. Meccano would have been another possibility but it was felt that it 
should be excluded since its ostensible function was that of a special ability 
test. 

(2) T-M-K Repeat was included as well as T-M-K first application in the 
hope that light might be thrown on the sometimes fairly different correlations 
of these two variants with the other tests (see Table 21, р. 101). The net effect, 
however, seems if anything to have been to overweight the batteries with 
Kohs performance. 

(3) Originally it was intended to pair off the spatial and conceptual per- 
formance tests in all four possible combinations. Scrutiny of the records, 
however, showed that insufficiency of numbers for some of the combinations 
would render this impossible. For the same reason it was found impossible 
to use French Reasoning or any other single-language version in the battery 
that included Passalong. Consequently Passalong scores were correlated. with 
appropriate equivalent scores for Reasoning in all relevant languages. 

Additional points for notice are: 

(i) Coding appears in conjunction with 1943 and not 20-minute Matrices— 
again as a result of administrative conditions obtaining at the time. 

(ii) Analysis 1 is based on the ‘X’ sample (see p. 103). 

(iii) Corrected coefficients are used in analyses 2, 3 and 4, the variance of 
the French Reasoning test being taken as criterion in analyses 2 and 4, and 
that of twenty-minute Matrices in analysis 3. 

Centroid analyses, assuming one general and two bipolar factors in each 
instance, yielded the unrotated loadings shown in Table 32. 


Table 32 FACTOR ANALYSES OF PEMBERLEY. TEST BATTERIES: UNROTATED LOADINGS 


Factor: 1 П ІІ 
PEAT ST ТАҢ 15.2. 4. 4 TL 24 S4 Tr 


Matrices 714 780 850 741 | —235 —378 —223 —237 | -061 —233 --024 --081 
Reasoning | 751 851 827 871 | —433 —421 —217 —418 | +178 +233 +139 4-187 
Coding 447 —326 —272 
T-M-K 880 892 870 828 | +369 +109 4-291 +306 | +189 —202 —048 +105 


T-M-K (Re- 

peat) 834 884 856 877 | +233 +182 +317 --326 | +129 —105 —266 --108 
Carl Hollow) 

Square 612 626 617 | +267 +344 +346 | —319 —081 —208 
Passalong 565 +240 +317 
Trist-Har- 

greaves | 614 580 —204 411 —116 —163 
г 

Vigotsky 530 +166 +389 


STATISTICAL DATA 


Proportion of variance 


Analysis 
Factor 1 2 3 4 
I 549 597 592 557 
И 091 085 085 110 
Ш 034 053 037 030 
Total 674 735 684 697 


Examination of this table suggests the following conclusions: 


A. In relation to the tests: » 

(1) No difference exists between the factor structure of the 1943 Matrices 
and that of twenty-minute Matrices. This is of course in accordance with in- 
tention in designing the revised version. The impression that ‘progression’ is 
relatively absent in the series of problems is therefore not confirmed—or, 
rather, it has been demonstrated that the nature of the task is probably un- 
affected. 

(2) There is no observable difference, in terms of factor structure, between 
the two applications of T-M-K. 

(3) Carl Hollow Square and Passalong appear to be equivalent as far as 
the first and second factors are concerned; there is a marked difference in 
relation to the third factor. 

(4) Semeonoff-Vigotsky and Trist-Hargreaves are similar in respect of 
only the first factor. 


B. In relation to the factors: 

(5) First factor loading in all tests are substantial and remarkably con- 
sistent. T-M-K takes the first place, just above Reasoning. Comment has 
already been made on the possible source of this occurrence, but it should be 
noted that the Garston analysis placed Т-М-К second in a battery in which 
only a single application was included (see pp. 108 and 111). 

(6) The second factor contrasts T-M-K and the form-board-type tests 
(together with Semeonoff-Vigotsky, which has a somewhat lower loading) 
on the one hand, with the written tests and Trist-Hargreaves on the other. 
If positive polarity is assigned to the former group this may be interpreted 
as a space-perception factor. Р 

(7) Positive and negative signs are less regularly distributed in the third 
factor, but it can be seen to contrast Reasoning with Carl Hollow Square, 
and there is some evidence to support the view that this may be an educational 
factor (not necessarily with its customary verbal component). 

109 


DIAGNOSTIC PERFORMANCE TESTS 


In an attempt to reduce anomalies, a series of perhaps rather arbitrary 
rotations was carried out, the following (Table 33) being apparently the 
maximum consistency that could be achieved. The process involved the 
reversal of the interpretation of signs for Factor II in Analyses 1 and 4, and 
corresponding re-alignment of the two bipolar factors. 

The outstanding anomaly that remains is the absence of a loading in the 
third factor for Matrices in Analysis 2. 

Other noteworthy points are: 

(1) Except in Analysis 2, first-factor loadings in Reasoning and Matrices 
drop definitely below the level of those for T-M-K. 

(2) Coding loses most of its first-factor loading, as does Semeonoff- 
Vigotsky. 

(3) The interpretation of Factor II as a space factor is strengthened. 

(4) The third factor assumes considerably higher proportions; it now 
appears to be strong in the written tests and the conceptual tests, just present 
in Passalong, and absent in T-M-K and Carl Hollow Square. Its ‘educational’ 
nature appears to be reaffirmed; or it may be regarded as a ‘conceptual’ 
factor, as in the analysis of the Mitchell Vocabulary test described in Appendix 
ПІ below. Since efficiency in conceptual thinking has always been regarded as 
a function of g, the nature of our first factor is open to question. 

(5) Discrepancies between the pairs of performance tests of like type are 
almost entirely resolved. 


Table 33 FACTOR ANALYSIS OF PEMBERLEY TEST BATTERIES 
ROTATED LOADINGS 

Factor: I И ІШ 
Analysis: 1:2 574 1 2 3 4 1 2 3 4 
Matrices 54 89 74 56 20 (08) (07) 12 48 (03) 47 48 
Reasoning 69 84 67 62 | (-07) (-05) 15 (01) 55 51 53 68 
Coding 23 31 50 
T-M-K 87 74 84 89 44 54 38 27 |(—06 14 (06 (01) 
T-M-K (Re- 

peat) 79 67 91 89 38 57 24 29 (06) 23 (-09) (01) 
Carl Hollow 

Square 3637 56 63 59 49 15 18 (—06) 
Passalong 40 52 21 
Trist-Har- 

greaves 52 50 22 (—03) 44 38 
Semeonoff- 

Vigotsky 25 28 57 


Garston factorizations 
Two further analyses were carried out on the Garston data, each using the 
available ‘criterion’ tests (‘1943’ Matrices, English Reasoning, V.LT., 


110 


STATISTICAL DATA 


Shortened Wechsler, and Squares), together with the operative combination 
of performance tests (see p. 81). Four factors seemed to be necessary to 
account for the variance of these batteries, and the unrotated loadings 
obtained were as in Table 34. 

The general and first bipolar factors seem capable of carrying the same 
interpretation as in the case of the Pemberley analyses, with V.LT., Wechsler, 
and Squares fitting in as expected. The second bipolar factor again appears 
to be educational in character; the *conceptual' interpretation loses force 
owing to Semeonoff-Vigotsky appearing on the opposite side (as it does in 
respect also of the first bipolar). 

The third bipolar factor principally contrasts V.I.T. and Reasoning with 
Wechsler; loadings in the other tests, with the exception of Зетеопой- 
Vigotsky, are mostly negligible. The nature of this factor is not clear, but if 
the positive pole is assigned to Wechsler it may be regarded as relating to 
responsiveness in social situations; that Semeonoff-Vigotsky appears on the 
same side seems to bear out this suggestion, since it involves social interaction 
to a much greater extent than either T-M-K or Carl Hollow Square or even 
Trist-Hargreaves. 


Table 34 GARSTON FACTOR ANALYSES; UNROTATED LOADINGS 

Factor: 4 П Ш IV 
Analysis: A B А В А В А В 
Matrices 653 656 —043 —085 —186 —308 —106 039 
KIT. 683 703 —151 —276 404 200 | —260 —256 
Reasoning 871 861 —106 —140 078 032 | —159 —114 
Wechsler 608 550 —306 —366 107 151 334 22 
Squares 490 532 521 409 241 248 083 060 
T-M-K 716 458 —240 —084 

ca aad E БЕ 383 3 509 CS —250 i —057 
emeonoff- Vigotsky ES 

Trist-Hargreaves 344 —047 —091 038 


Proportion of variance 


Analysis 
Factor A B 
1 407 359 
П 107 095 
Ш 070 041 
ІР 038 020 
Total 622 515 


DIAGNOSTIC PERFORMANCE TESTS 


Tentative rotations of axes for these analyses are not reported, since the 
pattern of loadings contains no serious anomalies, and since rotation between 
the general and the third bipolar factor would add to some of the small load- 
ings which should probably be regarded as genuinely approximating to zero. 
This fourth factor may perhaps also be regarded as a ‘true bipolar’, since 
‘detachment’ (as the opposite of social participation) is likely to favour per- 
formance in the written tests, including Matrices. 


Summary on factor content я 

It would appear that the nature of our performance tests is broadly similar 
to what was to be expected. Nevertheless, commonly accepted primary factors 
do not seem to be sufficient to account for all the observed variance, at any 
rate when dealing with selected populations. It is suggested that the remaining 
factors, if such there be, should be regarded as reflecting personality variables, 
rather than as strictly cognitive in character. 

This view is borne out by some unpublished work by Semeonoff and Cook- 
son, interim reports of which were read at meetings of the British Psycho- 
logical Society at Nottingham in 1953, and of the Scottish Branch in 1955 
(Semeonoff, 40). The Time and Clues components of Semeonoff-Vigotsky, 
taken separately, were analysed in batteries combining cognitive and рег- 
sonality measures, and were shown to carry markedly different loadings in 
the factors extracted. y 

As regards the pairs of ‘like’ performance tests, it would appear that 
although the correlations yielded by Carl Hollow Square and Passalong 
differ considerably, their factor structure is very similar. This is less true in 
the case of Semeonoff-Vigotsky and Trist-Hargreaves. Further investigation 
of this point would probably be more fruitfully carried out in a diagnostic 
or clinical setting. 


С. RELIABILITY OF THE 1.6. CONVERSION 


As indicated earlier, part of the purpose of having available at Pemberley a 
range of performance tests of contrasting types was to allow candidates the 
maximum opportunity of ‘doing themselves justice’ on tasks which it was 
thought would be most suited to their special abilities. Since there was 
obviously no method available for assessing such suitability, other than 
short-term judgment on the part of the Board psychologists, there was 
clearly room for ‘error’ in choosing which tests to apply to any given candi- 
date. Or, even setting aside considerations of whether the notion of a ‘maxi- 
mum’ estimate of intelligence is tenable, there is the question of the extent 


112 


STATISTICAL DATA 


to which composite assessments based on differing combinations of tests 
included in the battery will vary. Knowing the correlations of the tests with 
one another, it is of course possible to calculate correlations of the various 
possible combinations with one another, or with any single test that might 
be chosen as a criterion. Use could then be made of regression equations to 
ensure optimum prediction. 

In the Pemberley investigation, however, where suitable tests had to be 
developed pari passu with their application, such a course was impracticable; 
it would have been only a degree less so in W.O.S.B.s, where already 
standardized tests were used, for the most part, but where fresh tests had 
sometimes to be substituted, or confirmatory tests applied, as circumstances 
demanded. Furthermore, in selection work involving intensive study of 
relatively small numbers of subjects, interest centres in the individual case 
rather than in statistical prediction. 

In order, therefore, to provide information on the incidence and magnitude 
of possible ‘error’ in global assessment on a percentile point-scale, a varied 
sample of results from Pemberley candidates was examined, Intelligence 
Gradings based on different combinations of test scores being compared. 

Although the standard Pemberley procedure was to combine scores on 
two unvarying tests with a composite on those from three (occasionally two) 
performance tests, it was thought advisable to investigate discrepancies 
between less homogeneous batteries. Accordingly, alternative intelligence 
gradings were calculated based on the following data: 

1. The ‘standard’ battery, yielding the ‘operational’ I.G. 

2. A ‘written’ battery, consisting of Matrices and Reasoning, together 
with Coding, when this had been given. 

3. A ‘performance’ battery, or batteries: when more than three test 
scores were available—as in the majority of cases, since all candidates were 
given the Repeat as well as the main T-M-K—all possible combinations were 
taken three at a time. 

4. Where candidates had originally been assessed in terms of ‘grades’ 
instead of equivalent scores (see Chapter 7, pp. 81-82) these I.G.s were 
also recorded. 

It may be remarked that all these methods of assessment gave nearly 
identical distributions of I.G.s on the nine-point scale. The only distribution 
appreciably different (but quite insignificantly so, Р > 0:3) was that of 
method 4, which of course indicates faulty standardization rather than true 
discrepancies. 

Four groups of candidates, seen at different stages of the Pemberley pro- 
cedure, were included in the study. The constitution of the sample is noted 


in Table 35. 


i 113 


DIAGNOSTIC PERFORMANCE TESTS 


Table 35 PEMBERLEY SAMPLE ANALYSED FOR DISCREPANCIES IN INTELLIGENCE GRADING 
Group п Version of Coding Performance tests taken 
Matrices. 
I 94 45-minute not given Carl Hollow Square, Semeonoff- 
Vigotsky 
П 55 20-тіпше given Carl Hollow Square, Semeonoff- 
Vigotsky, Trist-Hargreaves 
ІШ 43 1943 given Carl Hollow Square, Semeonoff- 
Vigotsky, Trist-Hargreaves 
ІР 123 20-minute пої given Carl Hollow Square, Semeonoff- 
Vigotsky (few), Trist-Hargreaves, 
Passalong (few) 


In the following comparison of the resultant gradings, any discrepancy of 
one step on the nine-point scale has been regarded as negligible. This con- 
vention, which of course has the effect of exaggerating the amount of agree- 
ment, has been adopted to conform with the attitude, adopted in practice, of 
considering, for example, a ‘high 7’ as interchangeable with a ‘low 8’, since 
a shift of this magnitude might, in the individual case, be due to an ‘unlucky’ 
(or ‘lucky’) combination of approximations. 

Taking as criterion the ‘operational’ I.G., as determined by method 1, 221 
cases out of 318 showed zero or negligible discrepancy. This figure amounts 
to just under 70 per cent and is substantially the same for all groups (see 
Table 36). The poorest result was given by Group III (i.e. Coding included, 
and 1943 Matrices). If one excludes discrepancies arising from the use of 
‘grades’ (method 4) and combinations of performance tests other than the 
‘operational’, the number showing zero or negligible discrepancy rises to 
259, or 81 per cent. 


Table 36 DISCREPANCIES IN INTELLIGENCE GRADINGS 
Discrepancies 

zero or 3 points 

Group n negligible or more 
% 9 

1 97 606 ю d 

И 55 40 73 4 1 

Ш 43 25.1858. 3 T 

IV 123 9 73 5 4 

All 318 21 70 22 7 


114 


STATISTICAL DATA 


Discrepancies of as many as three points from the criterion were found in 
22 cases in all; these again were distributed fairly evenly (see Table 36). The 
highest such discrepancy was 4 points (4 cases). 

Examination of the ‘performance battery’ taken separately showed that of 
the 318 cases analysed, 162 had more than three performance scores, thus 
yielding data for more than one assessment on the basis of performance tests 
alone. The criterion chosen in this case is again the ‘operational’ combination, 
i.e. the first application of T-M-K, together with whichever other tests had 
been applied. Of these 162 cases, 133 showed ‘no’ discrepancy (as previously 
defined), and only 7 showed as many as 3 points total discrepancy (i.e. differ- 
ence between maximum and minimum grading). 

Total discrepancies over all types of grading were distributed as follows: 

Discrepancy of 6 points: 2 cases 

Discrepancy of 5 points: 2 cases 

Discrepancy of 4 points: 9 cases 

Discrepancy of 3 points: 44 cases 

A separate study, not previously reported, was later carried out to examine 
discrepancies between composites based on the two standard written tests 
and one performance test, weighted equally with the written tests. These are 
shown in Table 37. 


Table 37 DISCREPANCIES IN INTELLIGENCE GRADINGS BASED ON TWO WRITTEN TESTS AND 
ONE PERFORMANCE TEST 


Points Group 


Discrepancy I II III IV Total 96 
4 1 1 03 
3 3 1 3 3 10 3 
2 16 12 8 18 54 17 
1 50 27 22 55 154 48 
0 28 15 9 47 99 31 


The net conclusion would seem to Бе that where batteries of tests sufficiently 
different in character (as evidenced by moderately high positive correlations) 
to be worth the expenditure of time are administered, and where not all indi- 
viduals are given the same tests, global assessments will be substantially the 
same in, very roughly, three cases out of four. The reader is invited to decide 
for himself whether, on statistical grounds, this is a satisfactory figure. In the 
present context, however, the interest lies in the remaining cases; it is 
suggested that adequate understanding of their cognitive functioning can 
only be attained following detailed individual study. 

115 


Appendices 


APPENDIX I 


THE SHORTENED WECHSLER VERBAL SCALE 


The testing procedure at W.O.S.B.s allowed for ‘confirmatory’ tests to be 
administered individually to candidates whose performance on the standard 
battery of written tests might appear to present anomalies. Circumstances 
recognized as indicating the advisability of additional tests were as follows: 

(i) when marked discrepancies appeared among the equivalent scores on 
the standard battery; 

(ii) when the O.LR. as measured by the standard battery appeared to be 
at variance with the evidence from the candidate’s personal history—e.g. his 
educational or occupational record; 

(iii) when a candidate of low O.LR. appeared to be shaping well on other 
parts of the course. 

Two additional tests were administered in such cases: Т-М-К, series A-F 
(see Chapter 4, p. 46), and a specially standardized short version of the verbal 
part of the Wechsler-Bellevue scale (Wechsler, 47), comprising three sub- 
tests—Comprehension, Similarities, and Vocabulary. 

Equivalent scores for each of the two tests were derived: an ‘amended’ 
O.LR. for a given candidate was computed by adding these scores to the 
equivalent scores already obtained for the tests in the standard battery, 
multiplying by 3, and dividing by 5. Amended O.I.R.s seldom differed from 
unamended by more than one grade, but this might of course make all the 
difference between an acceptance and a rejection disposal. Of 93 candidates 
given confirmatory testing at the No. 14 W.O.S.B. reliability experiment, 16 
gained one grade, 22 dropped one grade, and one dropped two grades; the 
correlation yielded by these figures is 0:936. 

The value of the procedure, however, lay not only in increased validity of 
the ‘amended’ O.LR., but in the opportunity it afforded for personal contact 
between the tester and the candidate—that feature of the testing situation, 
in fact, the importance of which constitutes a main theme of this book. 

Correlations of the Shortened Wechsler Verbal Scale with other tests are 
noted in Chapter 8, pp. 103-104; its factorial content is briefly discussed in 
the same Chapter, p. 111. 

119 


DIAGNOSTIC PERFORMANCE TESTS 


RELATION TO THE WECHSLER-BELLEVUE SCALE 


The Shortened Wechsler Verbal Scale differed in no material way from the 
portions of Weschler’s original scale from which it was adapted. Wechsler’s 
intention was to provide a series of tests acceptable to adult subjects that 
could be administered individually in a fairly short space of time. His scale 
consists of ten sub-tests, five (the ‘Verbal’ scale) predominantly verbal in 
character, and requiring only spoken answers from the subject, and five 
(constituting the ‘Performance’ scale) that involve manipulation of material 
—in one case merely paper and pencil. In addition, the Vocabulary test— 
one of the three used іп the W.O.S.B. adaptation—was originally available 
as an ‘alternate’, but was later recommended for universal use. 

Wechsler calls attention to the value of vocabulary as a measure of general 
intelligence, to the fact that it is less affected by cultural and educational 
background than is commonly supposed, and to the scope for diagnostic 
indications inherent in the variety of possible ways in which the subject may 
choose to frame his definitions. From the point of view of scoring, however, 
as Wechsler says, ‘any recognized meaning is acceptable, and there is no 
penalty for inelegance of language.’ (This standpoint was also maintained, 
incidentally, in the W.O.S.B. scoring method for the Mill Hill Vocabulary 
Test; see Appendix II, p. 125). 

These considerations commended the Vocabulary sub-test for inclusion in 
the desired shortened scale, which was designed, in general, to provide a 
brief but reliable form of individual test that should: 

(i) approximate fairly closely to Stanford-Binet conditions of adminis- 
tration, since these represent the prototype of all intelligence testing on an 
individual basis; 

(ii) consist of ‘adult’ material; and 

(iii) rest upon already available standardization. 

The ‘Performance’ portion of the scale was ruled out as generally irrelevant 
to and unsuitable for the purpose in hand; and much the same could be said 
to apply to ‘Memory Span for Digits’. Of Wechsler’s ‘Verbal’ tests, ‘Infor- 
mation’ was too specifically American in content to permit of ready adapta- 
tion; ‘Arithmetical Reasoning’ (besides being largely concerned with reckon- 
ing in dollar currency) seemed to be adequately covered by other parts of the 
normal testing programme. The three appropriate sub-tests thus seemed to 
select themselves, as follows: 

(a) Comprehension 

(b) Similarities 

(c) Vocabulary. 
120 


APPENDIX I 


MODIFICATIONS INTRODUCED FOR USE AT W.O.S.B.S 


Only a few changes, of a minor kind, were found necessary, either because 
of cultural differences, or to allow for variations in usage. In the Compre- 
hension sub-test, six of the ten questions required re-wording; Similarities 
and Vocabulary each required to be amended in respect of only one item. 
In Vocabulary, however, the order of presentation of the words was altered 
to conform to the order of difficulty indicated by a preliminary try-out. 


ADMINISTRATION AND SCORING 


Since it is not intended that the *shortened' scale should be used as an 
alternative to the full Wechsler-Bellevue Scale! the full Instructions used at 
W.O.S.B.s are not reproduced here. These closely followed Wechsler’s 
‘General Instructions’ (47, рр. 1718). Scoring also followed the normal 
practice of allowing full, partial, or zero credit according to the quality of the 
response, with reference to the ‘Criteria sheets’ in cases of doubt. 

The requirements of the W.O.S.B. selection procedure made relation to 
general population standards unnecessary, though this was obtained towards 
the end of the war on General Service intake; scores on the three sub-tests 
assessed separately, and on the Shortened Scale as a whole, were therefore 
incorporated into the general system of equivalent scores. Since, however, 


it was considered convenient to make it possible to continue to assess in 


h Wechsler's own, the sum of weighted scores 


terms of figures comparable wit! SCO! 
on the three sub-tests (47, рр. 187-9) was multiplied by 5/3, thus yielding 


figures directly comparable with weighted totals for Wechsler's ‘Verbal’ 
scale. It was shown that 1.0.5 derived from this adjusted score were within 
4 points of Wechsler’s ‘Verbal I.Q.’. Vice versa, it may be assumed that ‘full’ 
Wechsler Verbal Scale scores may be related at a fairly close level of approxi- 
mation to equivalent performance on the other tests covered by the tables of 


equivalent scores. 

It should be noted, however, that the practice at W.O.S.B.s was to report 
results in terms not of 1.Q.s—whether Wechsler Verbal 1.0.5 or otherwise— 
but of equivalent scores, as explained above. Consequently there was never 
any danger that results derived from the ‘shortened’ scale might be mistakenly 


һе Wechsler-Bellevue Scale itself has been largely superseded by 
Adult Intelligence Scale (48), of which a version modified 
ad from the National Foundation for Educational Re- 


ole Street, London, W.l. 
121 


1 Since this was written t 
the introduction of the Wechsler 
for use in Great Britain may be hi 
search in England and Wales, 79 Wimp 


DIAGNOSTIC PERFORMANCE TESTS 


understood to have been derived from the entire Wechsler-Bellevue Scale. 
Similarly, since standardization of ‘shortened scale’ performance in terms of 
equivalent scores was carried out on a purely ad hoc basis, the risk of error 
due to unrecognized non-equivalence of amended and original items was of 
no practical importance. 


122 


APPENDIX II 


THE INTERPRETATION OF EQUIVALENT SCORES 
CORRECTED FOR AGE 


—————————— 


As stated in the Preface, the material of this chapter has been included on 
grounds of general relevance to the main theme of this manual, and in par- 
ticular because it presents a further unpublished part of the work of No. 25 
W.O.S.B. within the framework of which the performance tests were de- 
veloped. 

Summarizing the results of pre-war research on the relation between age 
and test performance, Vernon and Parry (46, pp. 188 ff.) write: 


“Between the twenties and sixties there is a progressive decline on tests on g 
involving abstract reasoning and speed of mental manipulation, though on 
other tests of what has been called “crystallized” intelligence, such as vocabu- 
lary and information, the level is better maintained.’ 


The findings of several large-scale Services investigations, using general 
intake or recruit samples, which they quote, bear out the first part of this 
statement, but little information is available regarding tests which ‘hold up’ 
with age. 

The testing procedure described in this chapter was evolved for use at 
*W.O.S.B.s (Officers). These were іп effect re-allocation units, at which 
officers who were for any reason regarded as misfits were re-assessed with a 
view to re-employment or release. A high proportion were older men, many 
of whom had not been previously tested, others would undoubtedly have 
already been through selection procedures. Since it was necessary to use 
tests for which officer norms were already available, Progressive Matrices 
was used as the main instrument of testing. This test had also the advantage 
that comparable figures existed for both the 45-minute (virtually untimed) 
and the 20-minute versions, and a method of administration, allowing for 
scores to be obtained under both conditions was adopted. The Mill Hill 
Vocabulary Test (Set A) (Raven and Walshaw, 38) was used as a comple- 
mentary test of ‘crystallized’ intelligence, and a revised method of scoring, 
allowing for ‘partial credit’, as in the Wechsler Vocabulary test, was intro- 
duced for Part I (Definitions). 

123 


DIAGNOSTIC PERFORMANCE TESTS 


PROGRESSIVE MATRICES: TIMED AND UNTIMED 


The ‘double’ method of application used was as follows: Papers were dis- 
tributed in the usual way, along with coloured pencils, and the normal intro- 
duction for the 20-minute version was given. At the end of twenty minutes 
the coloured pencils were taken in, and candidates were told that the ‘next 
test’ was intended to find out how different people were able to make use of 
the further time that would now be allowed; they were to finish the series of 
test problems, if possible, or to look over and amend, if desired, what they 
had already done. Ordinary pencil or pen was used during this part of the 
test, so that what had been done during the two periods could readily be dis- 
tinguished. 


Comparison of scores on the two forms of the test may be expected to be 
informative in two ways. First, the 20-minute form may be said to give scope 
for a rapid response to a new situation, whereas the 45-minute form may be 
regarded as analogous to a relatively familiar problem, or at any rate a 
problem to which there has been some degree of adaptation. The second and 
probably more important point is in relation to motivation. In the circum- 
stances obtaining at W.O.S.B. (Officers), motivation during the second. part 
of the test (as well as in general) tended to be low. This was perhaps combated 
to some extent by introducing the extra period as a ‘separate test’, but per- 
severance in this situation tends in any case to flag, and inferences regarding 
staying-power as well as the quality of the individual’s cognitive processes 
may legitimately be drawn. Examples of interpretation follow the tables of 
Equivalent Scores (Tables 38-39, pp. 127-30). 


MILL HILL VOCABULARY 


As with other tests, rigid adherence to a specified form of words in intro- 
ducing the test is not recommended. At W.O.S.B.s the following introduction 
was given; it may be varied to suit circumstances, provided that the essential 
information is conveyed: 


‘In the first part of this test you have to write down in a few words the mean- 
ing of a given word. In the second part you have to choose and underline 
the word which means the same or nearly the same as the given word. If you 
are not certain of the word to underline choose the one which you think is 
the nearest in meaning to the given word.’ 


124 


APPENDIX П 


For group administration a time-limit of 15 minutes is customarily im- 
posed; subjects should be told of this, and also that a warning will be given 
five minutes before time is up. Since the element of speed is not intended to 
play any part, the time limit need not be strictly observed if the test is being 
administered individually. In group administration, questions should be 
asked for before giving the signal to start. The only question likely to cause 
doubt is whether definition by example is permitted. To this the answer is 
that it is not really satisfactory. 


Scoring 

Standard practice in the Mill Hill Vocabulary Test (Raven and Walshaw, 38) 
allows one point for each correct response in either part of the test. Experience 
at W.O.S.B.s suggested that superior discrimination (and probably increased 
validity) could be obtained by allowing ‘partial credit’ for definitions which 
fell short of full adequacy, but which nevertheless indicated that the candidate 
possessed at least a reasonable working knowledge of the connotation of a 
given word. A Scoring Key based on a scrutiny of 360 cases was prepared, 
embodying the definitions given in Chambers’s Twentieth Century Dictionary, 
together with as comprehensive a selection as possible of pattern responses 
which should rate ‘full’ credit (2 points), ‘partial’ credit (1 point), or ‘reject’ 
(nil). 

Two further rules were provided for guidance in scoring: 

(i) When a definition contains two or more parts which fall into different 
credit categories, the higher (or highest) should be scored: e.g. Shrivel: Чо 
wither and close up’ would score 2 points for ‘to wither’, rather than 1 point 
for ‘close up’. 

(ii) Definition merely by example, whether using the form of word supplied 
or a grammatical variant, scores nil. 

e.g. Conciliate: ‘to conciliate an enemy” 
Prosper: ‘to be prosperous’. 

The ‘possible’ score for Part I (Definitions) thus becomes 68 points, and 

to balance this each correct response in Part II (Synonyms) is scored 2 points. 


Standard Equivalent Scores 
Equivalent scores for the Mi 
separately, and for the test as а м 


1l Hill Vocabulary were calculated for each part 
hole. At W.O.S.B.s (Officers) equivalent 
scores for the two parts were added to those for the two forms of Matrices, 
and conversion to the normal O.LR. scale was effected by taking three- 
fourths of this total. In other connexions the figures for the test as a whole 
were used, and this plan may of course be followed if it is desired to compare 
Mill Hill Vocabulary performance with that on other tests covered by the 
tables. 

125 


DIAGNOSTIC PERFORMANCE TESTS 


Some implications of a comparison of equivalent scores for the two parts 
will be found in the examples. In general, a higher equivalent score on 
Synonyms may indicate either successful guessing or poor powers of expres- 
sion in Definitions. Further evidence on this point may emerge from an 
examination of the definitions themselves; thus, a high incidence of 1-point 
credits, indicative of poor fluency, or of poor comprehension, may be 
associated with a much higher Synonyms score. (The nature of incorrect 
choices in Synonyms may also be of interest.) Lower equivalent scores in 
Synonyms suggest unwillingness to commit oneself. Lower raw score on 
Synonyms is very rare indeed, and may indicate either some form of dis- 
turbance or a deliberate attempt to score low. 

The equivalent scores for the two forms of Matrices were derived from a 
composite sample of about 1,500 cases from various sources—candidates 
accepted at No. 1 W.O.S.B. in 1942, all taking the tests for the first time, re- 
allocation candidates as they became available, and—for the 45-minute 
version only—serving officers who had taken the test along with recruits at 
Primary Training Centres or Wings. While each sample had its peculiarities 
which prevented it from being fully representative of the officer population, 
these on the whole balanced one another out, and the net result was a series 
of norms based on a genuine population, rather than on a stub of a general 
population, from which 20-minute Matrices norms had previously been de- 
rived. 


It should be noted, however, that evident anomalies in Matrices figures for 
the highest age group of all are probably due to uncontrollable conditions of 
sampling (e.g. the possibility that less intelligent older men were relatively 
lacking in the officer populations at large). The same applies to the trend (not, 
of course, noted in the tables) for Vocabulary scores to continue to rise in the 
fifties or even higher. Information regarding the nature of the standardization 
sample for Vocabulary is insufficient to allow for an explanation. While the 
usefulness of norms based on officer populations, in this context possibly even 
more than in the broader one covered by Chapter 7, is limited, the general 
nature of the officer population, ie. somewhat negatively skewed within 
(roughly) the upper half of the population at large, is sufficiently well defined 
to make assessment within its framework meaningful. 


ILLUSTRATIVE CASE MATERIAL 


The following illustrative examples of interpretation of test scores dis- 
cussed in this Appendix are extracted from an R.T.C. ‘Instruction’. A key 
consideration in this context was whether candidates were suitable for 


126 


APPENDIX И 


Table 38 EQUIVALENT SCORES CORRECTED FOR AGE (MEAN 40) 
(a) MATRICES, 20-MINUTE 
Equivalent Age groups Equivalent 
score score 
To 24 25-29 30-34 35-39 40-44 45- 
3 6 © "d Жы з 
5 5 
51 59 58 58 9 58 Я 
50 58 37 37 38 58 56-57 50 
49 57 56 56 57 56-57 54-55 49 
48 56 55 55 55-56 54-55 5 48 
47 55 54 54 53 51-52 47 
53 33 33 51-52 5 
45 33 32 2 52 29-50 45 
2 En s 51 48 47 
43 si 50 50 49-50 46-47 45-46 43 
42 50 49 49 48 44-45 42 
41 29 48 48 47 43 42-43 41 
40 48 47 47 46 41-42 4l 40 
39 47 46 45 39-40 39 
38 45 45 43-44 38 37-38 38 
37 44 42 36-37 37 
36 45 43 43 41 34-35 34-35 36 
35 42 42 40 33 35 
34 43 41 40-41 39 31-32 31-32 м 
33 42 39 37-38 29-30 33 
32 41 39 38 36 28 28-29 % 
31 38 37 35 26-27 26-27 31 
30 39 37 36 34 24-25 5 30 
29 38 35-36 35 33 23 23-24 29 
28 37 4 34 31-32 21-22 22 
27 36 33 ЕН 30 19-20 20-21 27 
26 35 32 32 29 18 26 
25 34 31 31 28 16-17 17-18 25 
24 33 30 30 27 14-15 16 24 
23 32 29 29 25-26 14-15 2 
22 31 28 28 24 11-12 12-13 2 
21 30 27 27 23 0 21 
20 29 26 26 2 9-10 20 
19 8 25 25 21 6-1 19 
18 37 24 24 20 4-5 6-7 18 
17 26 23 23 18-19 3 4-5 17 
16 25 2 17 12 16 
15 24 21 20221 16 1-2 15 
4 23 20 19 15 14 
13 22 19 18 14 13 
it % 5 АТ it 
10 19 16 15 10 10 
9 18 15 14 9 9 
8 17 1% 13 8 8 
1 16 13 12 7 7 
6 15 12 11 5-6 6 
5 14 10-11 10 4 5 
4 13 9 9 3 4 
3 12 8 8 2 3 
2 11 7 7 1 2 
1 10 6 6 1 


127 


DIAGNOSTIC PERFORMANCE TESTS 


Table 38 EQUIVALENT SCORES CORRECTED FOR AGE (MEAN 40) 
(b) MATRICES, 45-MINUTE 


uivalent. Age groups Equivalent 
ае То 24 25-29 30-34" 35-39 40-44 45- score 
50 60 50 
49 60 58-59 49 
6 60 59 59 57 48 
47 59 59 51-58 57-58 55-56 41 
58 58 58 55-56 53-54 
45 37 37 55 54 52 45 
5 56 52-53 50-51 
43 56 55 55 53 50-51 28-49 43 
55 54 53-54 52 49 47 42 
41 54 53 52 50-51 47-48 45-46 41 
40 31 49 4 43-44 
39 53 50 48 4 42 39 
38 % 51 29 47 42-43 40-41 38 
37 51 46 41 37 
50 49 47 45 39-40 37-38 36 
35 49 48 43-44 37-38 35-36 35 
34 47 45 42 36 34 
33 48 46 41 35 32-33 33 
32 47 45 43 40 32-33 30-31 32 
31 46 44 39 31 31 
30 45 43 41 38 29-30 21-28 30 
42 40 37 21-28 25-26 29 
28 41 39 35-6 26 4 28 
27 43 38 34 24-25 22-23 27 
26 42 39 36-37 33 22-23 20-21 26 
25 41 3 32 21 25 
40 37 31 19-20 17-18 24 
23 33 30 17-18 15-16 23 
22 39 35 32 28-29 16 22 
21 38 34 31 27 14-15 12-13 21 
37 33 30 26 12-13 1 20 
19 36 32 29 25 11 9-10 19 
18 28 24 9-10 7-8 18 
17 35 31 27 23 7-8 17 
16 26 22 6 16 
15 33 29 25 20-21 4-5 2-3 15 
4 32 28 19 2-3 1 14 
13 5 27 23 18 1 13 
12 26 2 17 12 
11 30 25 20-21 16 11 
10 29 24 19 15 10 
9 28 23 1 13-14 9 
8 27 22 17 12 8 
7 26 21 16 11 d 
6 20 15 10 6 
5 25 19 14 9 5 
4 18 13 8 4 
3 23 17 12 6-7 3 
2 16 11 5 2 
1 15 10 4 1 


128 


APPENDIX И 


Table 38 EQUIVALENT SCORES CORRECTED FOR AGE (MEAN 40) 
(c) MILL HILL VOCABULARY TEST 
Equivalent Definitions Synonym: 
alent тод 2534 35-44 45 | То24 asa 8 45- pe cd 
PA 25-01 O = 
57 68 
56 66-67 я 
55 64-65 67-68 55 
54 62-63 66 68 54 
53 60-61 5 66-07 68 68 3 
52 58-59 62-63 64-65 66-67 2 
51 60-61 62-63 66 51 
50 55-56 58-59 1 6263 64 68 50 
49 53-54 56-57 58-59 60-61 62 66 6 49 
48 51-52 54-55 57 58-59 60 64 6 68 48 
47 29-50 53 5-6 56-57 58 62 64 66 47 
47-48 51-50 53-54 5 56 60 62 64 46 
45 46 49-50 51-52 52-53 34 58 ө 62 45 
4445 41-48 49-50 50-51 52 56 58 60 44 
43 4243 45-46 47-48 48-49 34 36 58 43 
42 4 43-44 46-47 50 54 56 42 
41 39-40 41-42 4 44-45 48 52 5 54 41 
40 37-38 40 41-2 42-43 46 50 50 52 40 
39 35-36 389 3 3941 4 48 48 
38 33-34 3637 37-38 37-38 42 46 46 38 
37 31232 34-35 35-36 3 40 44 44 48 37 
36 20-30 32-33 33-34 38 42 42 46 % 
35 27-8 30-31 31-32 31-32 40 40 44 35 
34 26 28-29 79-50 29-30 36 38 38 42 34 
33 24-25 21. 27-28 27-28 36 40 33 
32 2223 25-26 25-26 2 n 36 м 38 % 
31 20-21 2-4  23- 2 30 34 32 36 31 
30 18-19 21-22 21-22 21-22 28 32 30 34 30 
29 16-17 19-20 19-20 19-20 26 30 28 32 29 
28 14-15 17-18 17-18 16-18 28 26 30 28 
2 (лы: 6, is ip | > O NP 
25 8 12-13 12 10-11 | 20 24 22 24 25 
24 7 0-1 10-11 8-9 18 22 20 22 24 
23 Pep" 6-7 16 20 18 20 23 
4 BH MOTE О ТЕ 
20 Ы 1 10 14 12 14 20 
9 Е 8 12 10 12 19 
18 ка : 10 8 10 18 
17 6 6 8 17 
16 4 8 4 6 16 
15 2 6 2 4 15 
14 4 2 14 
13 2 3 
PREISE ынны eee 


re-employment on grounds of intelligence alone, and a bias in this direction 
will be noticed in what follows. The case studies were preceded by a longer 
version of the following preamble: 


K 129 


DIAGNOSTIC PERFORMANCE TESTS 


‘The analyses which follow go rather further than would be possible from an 
examination of the written data alone. The intention has been to show how 
test data may be worked into the general pattern of an individual’s history, 
as the latter emerges both from interview and from information extracted 
from questionnaires. Test data may to some extent be related to the latter 
without seeing the candidate at all, and this is what has in fact been done in 
preparing these studies. The next stage is to integrate these findings with the 
results of the interview, and this also has been done, although not perhaps 
in as great detail as could be achieved in a high proportion of cases .... 

*Assessment of the total performance may proceed by the following 
stages: 


4. A consideration of the general intelligence level as revealed by the 
test results, and of whether this is a true estimate of the candidate's present 
and original intelligence level. 

‘2. Implications of discrepancies between (corrected) scores on 20-minute 
and 45-minute Matrices, on Matrices and Vocabulary, and on Definitions 
and Synonyms should be noted. The first of these comparisons has its bearing 
on "speed" versus “power”, and оп questions of application or motivation; 
the second on verbal as against non-verbal intelligence, and on ageing 
generally; the third on fluency and “quality” of response. 

*4. How far does the information thus obtained agree or conflict with the 
candidate's educational and occupational background, and his Army record? 


Table 39. EQUIVALENT SCORES CORRECTED FOR AGE: 
CONVERSION OF SUMMED EQUIVALENT SCORE TO 
OFFICER INTELLIGENCE RATING 


S.E.S.4 O.I.R. Percentage of 
Serving Officers 

185- 10 5 
177-184 9 10 
167-176 8 20 
153-166 1 30 
143-152 6 20 
130-142 5 10 
126-129 4 4 
122-125 2 1 
118-121 2 0 
114-117 1 0 

-113 0 0 


130 


APPENDIX И 


“4. If discrepancies appear, how far are they attributable to such causes 
as organic conditions, illness etc., or personality disturbances ? 

‘5. Any attitudes revealed during the actual testing may be noted. Further 
indications, e.g. from handwriting, layout of questionnaires, etc., should not 
be overlooked." 


CASE STUDIES 
Case A. Equivalent scores 


Candidate aged 45 Raw score Corrected Uncorrected 
Matrices 


20-minute 42 41 28 
45-minute 45 41 29 
Vocabulary 
Definitions 25 32 27 
Synonyms 38 32 30 
S.E.S.(4) 146 114 x $ = 86 
O.LR. 6 4 


One of the few cases in which as much as 2 points difference between 
‘corrected’ and ‘uncorrected’ ОЛ.К. was encountered; taking the former as 
the truer estimate, the candidate’s level of intelligence is not below officer 
average. There is, however, a marked discrepancy between corrected equiva- 
lent scores on Matrices and Vocabulary. The low Vocabulary score accords 
well with his poor educational level [information available] for which his 
good general intelligence has compensated: he worked his way up from 
employment as driver to the position as manager of a transport firm. His 
Definitions are meagre and sometimes suggest that he had some idea of the 
meaning of a word although he could not express it properly; e.g. stance: 
‘method of poise’; latent: ‘dead’. In Synonyms his performance was per- 
functory and he left many blanks. In Matrices, on the other hand, he guessed 
a great deal, and wildly, his second attempts often being wrong as well as 
his first. All this fits in with the psychiatrist's description of him as а ‘lively 
hypomanic’. Note that the uncorrected equivalent scores misleadingly suggest 
that his non-verbal performance is of about the same level (below average) 
as his verbal. Note further that corrected equivalent scores are identical for 
the two forms of the Matrices test, and also, at their own level, for the two 
Vocabulary sub-tests. From this the general conclusion would be that this 
candidate has aged normally . . . 

131 


DIAGNOSTIC PERFORMANCE TESTS 


Case B. Equivalent scores 
Candidate aged 25 Каю score Corrected Uncorrected 
Matrices 
20-minute 51 44 36 
45-minute 53 41 35 
Vocabulary 
Definitions 30 35 30 
Synonyms 40 35 31 
S.E.S.(4) 155 132 x 3 = 99 
O.LR. 7 6 


Although а young man of good education (B.Sc. in Engineering at Man- 
chester) this candidate’s scores on Vocabulary are markedly lower than on 
Matrices. Synonyms show much guessing, only one item beyond. No. 20 (No. 
33) being correctly answered. Definitions also included 9 items in which, 
although an answer had been attempted, no credit could be scored. Experi- 
ence, however, suggests that this pattern is not uncommon among certain 
scientists with a rather limited outlook. The Matrices test was done quickly, 
the candidate handing in his paper after only 10 minutes of the additional 
time, during which he added 4 responses, 2 being correct. On the whole the 
performance is a perfunctory one. The candidate was described as still 
suffering from the effects of a recent breakdown, in view of which the im- 
patience which he seems to have shown is perhaps not surprising, if seen as 
a hysterical mode of dealing with his anxiety. Nevertheless he showed a 
certain amount of control, and the psychiatrist considered it unlikely that 
he would break down again, unless under special stress. 


Case C. Equivalent scores 
Candidate aged 50 Кау score ` Corrected Uncorrected 
Matrices 
20-minute 24 29 12 
45-minute 29 31 16 
Vocabulary 
Definitions 38 38 34 
Synonyms 36 31 28 
S.ES(4) 129 90 x 3 = 68 
O.LR. 4 0 


This is the largest discrepancy between corrected and uncorrected O.LR. 
encountered. It should be noted, however, that 68 is just within Uncorrected 
O.LR. 0, and that only the Definitions score raises the Corrected O.I.R. 


132 


APPENDIX И 


above the level of 2 or 3. And even the corrected O.LR. is low enough to 
suggest doubtful suitability for re-employment. 

Та Matrices there was little correct after B8, but some responses were 
successfully amended. Vocabulary showed much guessing in both parts, 
but towards the end of Definitions the candidate seemed to be losing con- 
fidence. In interview he was very talkative, which seems to link up with his 
comparative fluency in Definitions. Although the candidate is undoubtedly 
of rather low intelligence, the impression is of a man of considerable energy 
and resource, and his military history contained a record of a variety of 
employments, which he carried out with a fair degree of success. He felt that 
he had been badly treated in the Army, and this indication of mild paranoid 
tendency seemed to the psychiatrist to find confirmation in his evident 
interest in words for their own sake. 


Case D. Equivalent scores 
Candidate aged 28 Каю score Corrected Uncorrected 
Matrices 


20-minute 34 28 2r 
45-minute 35 22 21 
Vocabulary 
Definitions 32 36 31 
Synonyms 36 32 28 
S.ES(4) 118 101 x } = 76 
OLR. 2 B 


One of the few cases of a corrected O.LR. lower than an Uncorrected. 
While all possible sources of this rather exceptional phenomenon were not 
investigated, it appears that in a younger man this can only be caused by 
one of two things. It may be produced by an unusually low all-over Matrices 
score, well below the average for the age in question. Or it may result from 
a performance on 45-minute Matrices which adds little to what was done in 
the 20-minute period and which is more typical of the older age groups. Un- 
corrected standards suggest that for low intelligences such a small increase 
is normal, as is seen in this candidate's equal Uncorrected Matrices scores. 
On Corrected standards, however, the 45-minute score is considerably lower, 
and it is this that seems to bring the O.LR. down from 3 to 2. And indeed the 
general picture here is of a feeble personality prematurely tired and aged, 
the performance being typical of an older candidate, with poor motivation 
and little adaptability. On all tests this candidate worked at a medium rate, 
showing little insight in the more difficult 


133 


DIAGNOSTIC PERFORMANCE TESTS 


His dossier showed him to be still suffering from the effects of poliomyelitis 
contracted in the Middle East. He was diffident and hesitant in speech. His 
education was better as regards background than in achievement—ordinary 
secondary school, but no certificates. 


134 


APPENDIX Ш 


THE MITCHELL VOCABULARY TEST 


Т 22;‏ ا ن 


Vernon (45) makes reference to this test in the following terms: 


‹... When older, serving officers came up for re-allocation both [the Mill 
Hill Vocabulary test] and other verbal tests often seemed to arouse a good 
deal of anxiety, and a more suitable test was devised consisting of 15 words 
(such as ronM and вит) for each of which four different meanings are to be 


given in writing in 15 minutes.’ 


The author of the test, in the R.T.C. Instruction describing it, enlarges on 
the subject of anxiety, which is of course inherent in other test situations 85 
well. The following is a slightly abbreviated and adapted version of what 
were described as ‘the Reasons for the Test’. 


It is recognized that written intelligence tests tend to create anxiety, either 
because of the material used (verbal or non-verbal), or because of the form 
of the response required. In respect of the material used it could be claimed 
in general that intelligence tests using verbal, and therefore broadly familiar, 
material should be less anxiety-arousing than tests based on more novel 
material (patterns, symbols). With regard to the form of response, least 
anxiety is aroused where the candidate is not acutely aware of how well 
he is doing (apart from the number of items completed), most where he 
knows that he is not getting the correct answers (or any answers at all). This 
does not simply mean that the selective answer type of question 18 less 
anxiety-arousing than the inventive, although on the whole this is probably 
true. For example, ina pattern-completion test, such as Progressive Matrices, 
it is less disturbing to choose an answer, although one is vaguely aware of the 
shakiness of the reasoning, than it is to guess at the word meaning the same 
as a given word from a group of six. In short, it may be the combination of 
verbal material with a consciousness that one is not doing well in the test 
that constitutes a situation most likely to produce the greatest amount of 
anxiety. 

135 


DIAGNOSTIC PERFORMANCE TESTS 


At W.O.S.B.s (Officers) the opinion was frequently expressed that an undue 
amount of anxiety was aroused, especially among the older officers, by the 
Vocabulary Test. In both parts of the test a stage is reached beyond which 
the individual knows clearly that he cannot proceed. The saturation point 
is reached when the words are unfamiliar in appearance, being words in- 
frequently met in reading and probably never used in conversation. Since 
words are the everyday method of communication, the factor of length of 
experience—i.e. age—is important in the building up of vocabulary. Thus 
when older subjects, especially, are presented with a test in which they are 
required to use words, it is reasonable to assume that they will regard it as 
offering some security, since it allows them to operate in a known field. The 
need to derive security from the word test may be heightened if the general 
situation in which the subject finds himself has tended to arouse anxiety in 
any case. . . . Excellent as ће Mill Hill Vocabulary Test proved itself for some 
purposes in the Army, the factors in the re-allocation situation were such as 
made it important to exclude any test which would in any degree tend to 
heighten the existing disturbance in the candidates. The added anxiety which 
the Mill Hill Vocabulary Test did in fact arouse was probably due to the shock 
of the contrast between the expectation and the actuality. The candidate 
approached. the test with some confidence, feeling that he could do well in 
this familiar medium, only to find that after a certain stage the material was 
completely beyond him, involving words that he did not even recognize by 
sight. It was to meet such a situation as this that the Mitchell Vocabulary 
Test was devised. 


RATIONALE 


The rationale of the form of vocabulary test here described may be introduced 
by a quotation from Wechsler (47): ‘The one serious stricture that can be 
made against a vocabulary test as a measure of a man’s intelligence is that 
the number of words a man acquires must necessarily be influenced by his 
educational and cultural opportunities.’ Some of the educational influence 
arises from a common method of compiling the list of words to be used in 
а vocabulary test, i.e. to take words in a certain sequence from a standard 
dictionary, e.g. every tenth word from the top of every tenth page. While this 
may give an appearance of scientific sampling there is little reality behind it. 
Such a method provides a sample of the complete verbal heritage of a nation, 
but no guarantee that the chosen words are to any extent used by the people 
of the nation. In a vocabulary test so constituted, with the words arranged 
in increasing order of difficulty, the further the progress the greater becomes 


136 


——-- 


] 
i 


- various drafts are summarized in Table 40; 


APPENDIX ІП 


the influence of specialized training. After a certain stage the words are not 
in common usage; consequently many people do not recognize them and 
are unable to make any attempt to use the words or to discover their meaning. 
Variations in standards of education and in direction of culture will affect 
to a lesser degree a vocabulary test which postulates as of prime importance 
the ability to use common words in different contexts rather than the estima- 
tion of the number of words known. Words are tools; we are interested. in 
differences in individuals not in respect of the number of tools held, but rather 
in differences in the skill and precision with which commonly possessed tools 
are manipulated. Words do not exist without a context, and a vocabulary 
test that postulates the use of words is conforming to this basic linguistic 
reality. Word usage does not necessarily involve the construction of a sen- 
tence containing the word. The method adopted in the present test has 
implicit in it the structuring of a context, by having the subject give several 
different meanings of common words, the answer being inventive, and 
expressed as a phrase or group of words, as the subject wishes. 

Thus the two main considerations underlying the present test are as follows. 
First, the words must be familiar in appearance to all subjects, and recogniz- 
able by them. This will allay anxiety. At no stage will the subject be-discour- 
aged by encountering completely unfamiliar material. The test endeavours 
to maintain interest and to stimulate the subject to give of his best by a 
positive motivation towards the test situation, the subject feeling that since 
the material is familiar the solution may be within his power. The second 
consideration, which will reveal reat individual differences, is the use that 
the subject makes of these words and the ways in which he is able to manipu- 
late them. Although a word is à symbol it may symbolize several realities, 
each one dependent upon a context. The aim is to assess the individual's 
knowledge of the number of contexts in which the word operates for him as 


a significant symbol. 
SUCCESSIVE DRAFTS 


Three successive drafts of the test were tried out, and recommendations for 
Fourth Draft, on which the 


further slight modifications were embodied in а 


form of the test as here presented is based. It should be noted, however, that 


norms etc. are derived from the Third Draft, but since this differed from the 


present form only in respect of order of items and in details of instructions 
у : ay be assumed to hold good—at 


(see below, 139 p.), the standardization m у 
least until further data are available. Differences in content between the 
а fuller account of these, and of 


the evolution of the test, is given in the following paragraphs. 
137 


DIAGNOSTIC PERFORMANCE TESTS 


Table 40 MITCHELL VOCABULARY TEST; WORD CONTENT AND ORDER IN SUCCESSIVE 


DRAFTS 

Words First Second Third Fourth 
Draft Draft Draft Draft 

BEAR 1 

REAR 3 

SOLE 5 

POST 6 A4 

MEAN 7 B2 

REEL 9 B3 

BULL 13 B7 

STALK 14 

BAR 16 Аб 

STRAIN 18 А9 

ВАЗЕ 19 B13 

PEER 20 

DEAL 21 B8 

FAST 22 A 10 

BLUFF 24 

STRAND 25 

GRATE 26 

STROKE 21 A2 

GRAIN 28 А 12 

BOUND A3 

ACCOUNT B10 

STRIKE ВІ! 

RANGE B15 

FAIR 11 А1 2 1 

ТІЕ 2 Bl 1 2 

BOW В14 4 Я 

PORT 12 B9 6 4 

BOLT А 14 5 5 

SOUND 10 B6 3 6 

FENCE 4 АП 8 7 

роск 30 А7 10 8 

SUIT 8 А5 9 9 

KEY 17 BS 7 10 

PRESS А13 11 11 

PLOT А15 15 12 

CLIP 29 B 12 13 13 

STUD 15 B4 14 14 

MAIN 23 АЗ 12 15 


First Draft. The First Draft consisted of 30 words, three meanings of each 
being required. Twenty minutes were allowed. No instructions about order 
of working were given, but the layout (similar to the example below) was 
suggestive of working straight through, i.e. completely disposing of each word 
in turn before going on to the next. 
Example: 
POUND (а) а sum of money 
(b) to batter or break into small pieces 
+ (с) 1602. 


138 


* APPENDIX Ш 


The test was given to а W.O.S.B. population of 230 candidates, yielding 
a Mean score of 64°45 (possible score 90), Median 66:34, and S.D. 4:85. 
These figures indicated negative skewing and poor discrimination, especially 
at the upper end. 


Second Draft. To meet the drawbacks just noted, а Second Draft was pre- 
pared, requiring four meanings for each word; an effort was also made to 
discover which words would yield optimum discrimination. Thirty words 
were used, divided into two sets, A and B, of 15 words each. Both sets were 
given to a group of 88 candidates at W.O.S.B.s, with an interval of one day 
between, it being arranged that half of the candidates would take Set A 
before Set B and the other half Set B before Set A. While 20 minutes was 
set as the time-limit for each test, extra time was allowed in order to provide 
additional information on items near the end of each Set. On the basis of 
an item analysis, 15 words were chosen to form the material of the Third 
Draft, ; 

Third Draft. With the Third Draft, an important change in instructions, and 
in the layout of the answer sheet, was introduced. In the previous versions 
of the test it had been assumed (see above, р. 137) that the candidate would 
complete all meanings—or as many as he could—for one word before рго= 
ceeding to the next. This meant that in terms of order of difficulty the test 
consisted of a series of ‘waves’, each word proceeding from ‘trough’ (first and 
easiest meaning) to ‘crest’ (last and most difficult meaning), followed by a 
fall to the easiest meaning of the next word. This pattern of order of difficulty 
was considered to be a disadvantage in a timed test. Personality difficulties, 
e.g. of an obsessional nature, might prevent one person from leaving a word 
until he had extracted all he could from it. He might, therefore, not reach 
the later words at all, of which a first and probably a second meaning 
would be easy for him. In order, therefore, to allow for a smooth increase in 
order of item difficulty it was decided that first and easiest meanings should 
be given for each of the fifteen words before proceeding to second and there- 
fore more difficult meanings; similarly, second meanings for all words before 
proceeding to third meanings, and third before fourth. This meant that there 
was not the same prolonged concentration on any one word, and it was 
therefore important that meanings already given for one word should be 
conveniently before the eye when searching for further meanings of that word, 
The Third Draft of the test provided an answer sheet with the fifteen words 
set down the left-hand side of the page, and the page divided into four 
columns А, В, C, and D. The subject was instructed to complete columns 
A, B, C, and D in that order, Thus his responses to any one word were set out 


139 


DIAGNOSTIC PERFORMANCE TESTS 


across the page opposite the word. This form of layout was similar to that 
reproduced. here (p. 146). 

Experience in administering the Third Draft revealed objections to a strict 
insistence оп this one method of completing the test; in any case it was 
impossible to enforce the prescribed method. In practice candidates tended 
to adopt a mixture of working down the columns and along the rows. As a 
result of this tendency the instructions were amended to allow complete 
freedom in the method of tackling the test (see Instructions, pp. 142-3, below). 

The Third Draft was given to 269 W.O.S.B. candidates, most of whom were 
candidates at No. 14 W.O.S.B. whose other test results are treated elsewhere 
in this manual. It consisted of 15 words selected from the 30 of the Second 
Draft; the time-limit was 15 minutes. 

Results were as follows: 
Mean 35:05 (possible 60), Median 34:85, S.D. 6:18. 
The distribution was slightly leptokurtic, and showed slight but insignificant 
(Р > 0-4) positive skewing. 

Equivalent scores were also calculated, and are incorporated in Table 17, 

p. 89). 


Fourth (Final) Draft. The final draft, embodying an altered item order based on 
the order of difficulty indicated by an analysis of the Third Draft and. the 
freer instructions, has not been used in W.O.S.B.s. It is, however, suggested 
that this should be the form of thg test adopted for any future use. The 
recommended Instructions will be found following the section on ‘Relation 
to Anxiety’, p. 142, below. 


RELIABILITY, VALIDITY, AND FACTORIAL CONTENT 


The reliability of the test, assessed by the split-half method, and corrected for 

length by means of the Spearman-Brown prophecy formula, was 0:883, a very 

satisfactory figure considering the restricted nature of the sample. 
Correlations with other W.O.S.B. tests were as follows: 


Matrices (1943) 273 
VA. :584 
Reasoning +553 
Shortened Wechsler 1559 
Squares "339 


All these coefficients are reliable at well beyond the Р = 0:01 level of con- 
fidence. 


140 


APPENDIX III 


A factor analysis reported by the author yielded the factor loadings (first 
approximation, unrotated) shown in Table 41. м 


Table 41 MITCHELL VOCABULARY TEST: AUTHOR'S 
UNROTATED LOADINGS 


1 П 
x EEUU ee a E ee 
Reasoning "854 --148 
WIT: “192 213 
Vocabulary 720 +095 
Wechsler "675 “116 
Grouping* :619 +363 
Matrices +593 —31! 
Squares 429 —333 


kms ال‎ 


* The Grouping Test, also undergoing development at this time, was a written test using 
the principle of Object-Sorting, Part I (Rapaport, 34, Vol. D), but without the concrete 
material. The subject is required to select from a 'pool' of words the name of an object 
which has some quality in common with a small group which serves as sample. The 
principle is thus also similar to one familiar in ordinary intelligence testing (c.g. in N.LLP, 
Group Test 33) except that the words in the sample groups are also always drawn from the 
‘pool’ just mentioned; thus different aspects of the same object have to be recognized and 
handled. The test’s correlations with the rest of the W.O.S.B. battery are very similar to 


those of Vocabulary. 


It was suggested that the loading of the first factor compared favourably 
with that of the recognized intelligence tests, and that the bipolar factor, 
interpreted as positive in the four tests using verbal material (as shown), was 
clearly of a verbal nature, but was not high in the Vocabulary test. 

Since, however, these two factors account for only 52.5 per cent of the total 
variance of the battery, and since it seemed clear that it would be possible to 
extract a second bipolar factor, the analysis was repeated, using improved 
communalities and making allowance for three factors. Unrotated loadings 
were obtained as shown in the left-hand portion of Table 42. 


Table 42 FACTOR LOADINGS OF THE MITCHELL VOCABULARY TEST AND ASSOCIATED TESTS 


Unrotated | Rotated 

1 П IH 1 П ПІ 
Reasonii +876 —155 097 64 '38 “50 
VIL s 769 0 090 ‘st 50 29 
Vocabulary *736 269 —241 *51 “64 (02) 
Wechsler “657 +208 — 081 40 +55 15 
Grouping 632 :309 327 :12 “62 46 
Matrices +596 —480 249 453 (—05) 61 
Squares 425 --222 —345 "59 (06) (—05) 


141 


DIAGNOSTIC PERFORMANCE TESTS 


These three factors account for, respectively, 46:7 per cent, 7-4 per cent and 
5:2 per cent of the total variance, or 59-3 per cent inall. 

Rotation of axes through 35 degrees between Factors I and И strengthens 
the interpretation of Factor II as a verbal factor. The Vocabulary test 
emerges with a high loading, in contrast to the author’s contention, but in 
agreement with a remark by Vernon (45, p. 59). The second bipolar factor 
seems capable to interpretation as a ‘conceptual thinking’ factor, and axes 
may be rotated to a fresh position (through 31 degrees) in support of this 
hypothesis. 

The loadings following from these two rotations are shown alongside the 
unrotated loadings in Table 42, rounded off to two figures, and with non- 
significant loadings in brackets. It is not suggested that the first (‘general’) 
factor does better than approximate to g. The loadings are low for this to be 
likely, even taking account of the nature of the sample. The loading for 
Squares, however, is high; this may be due to there being little scope in this 
battery for the emergence of a k:m factor. Hierarchies in respect of the 
interpretations proposed are satisfactory, even in detail. 

Conclusions with regard to the Vocabulary test would appear to be that 
its g-loading is approximately the same as those of the other tests in the 
W.O.S.B. battery, and that it has a sizeable loading in a verbal factor. 


RELATION TO ANXIETY 


Unfortunately, circumstances prevented the obtaining of any evidence re- 
garding the attitude of older subjects to the test. Among the young officer 
candidates to whom the Third Draft was administered it occupied an inter- 
mediate position on a scale of preference in relation to other tests taken. A 
large proportion did not feel strongly either way; while a few professed 
enthusiasm, only two stated that they disliked it. Thus for subjects of about 
twenty years of age it appears that the Mitchell Vocabulary test arouses no 
more anxiety than any other test. 


INSTRUCTIONS 


1. Fill in the details required on the sheet, then read the instructions and 
look at the example. Do not turn the paper over until I tell you. 


2. (When all have finished reading the Instructions): 
This test contains a list of 15 words with which you are all familiar. These 
142 


APPENDIX Ш 


words are all used at different times with different meanings; we want you 
to try and give four different uses of each word. 

There are two points to.remember here: 

Firstly, you are asked to give meanings as different from each other as 
possible. For example, as meanings for the word ‘lock’ it would be possible 
to give under A ‘to fasten’, and under B ‘an instrument for closing a door’. 
As you can see, however, there is no essential difference between the two 
meanings. Always try to give meanings which are quite different from each 
other, but if you are stuck for a meaning and cannot decide whether two that 
you һауе in mind аге the same thing or not, put them both down rather than 
leave a blank. 

Secondly, you will have seen from the example what is meant by giving a 
meaning of the word. It is not correct simply to give a sentence containing 
the word. As a meaning for ‘bar’ it would have been wrong to write down 
under A ‘He went to the bar for a drink’, or ‘the bar in a public house’. 


3. You are quite free to use any number of words to get your meaning over. 
Don’t spend a lot of time trying to express yourself in literary terms, but be 
sure that your meanings are clear and unambiguous. 


4. Now for the method of working through the paper: There are four 
columns, A, B, C, and D, for the four meanings of each word. One way is 
to give as many meanings as you can fairly easily for one word, before 
going on to the next word. On the other hand, you may prefer to work down 
column A first, before beginning Column B, giving the first meaning that 
comes to mind for each of the words before beginning to look for second 
meanings at all. Or you may find when doing the test that you want to use à 
mixture of these methods. In short, you are free to tackle the paper in which- 
ever way you like. 


5. You have fifteen minutes in which to get down as many meanings as you 
can. I shall let you know when you have had five minutes, and also when 
you have five minutes left. 

Are there any questions? 

АП right, then: Turn the paper over and begin. 

(Start timing). 


SCORING 


One point is allowed for each correct definition. Scoring is best done along 
the rows. Criteria are appended. In one or two cases additions have been 


143 


DIAGNOSTIC PERFORMANCE TESTS 


made, since it seems that differences might legitimately be claimed to exist 
between varieties of meaning originally grouped together as substantially 
the same. Such additions (and supplementations falling short of the status of 
independent meanings) are shown in italic. Conversely, it might be argued 
that some meanings listed as different (e.g. STUD, 1 and 2) are not really 
distinguishable and still further revisions of the test might eliminate doubtful 
items of this kind. Insight into the quality of the subject’s conceptual thinking 
may, however, be obtained from a consideration of what meanings he is 
prepared to accept (or, more properly, to propose) as different. 


CRITERIA 


1, FAIR (a) Market, place of amusement 
(b) Light in colour (of hair), blond 
(c) Just, honest 
(d) Moderately good (e.g. of marks) 
(е) Beautiful (esp. of women, or of weather) 


2. TIE (a) Neckwear, part of clothing 
(b) To fasten, join; a knot, coupling, connexion 
(c) A bond, attachment (e.g. family), hence hindrance, drawback 
(d) Equality, a ‘dead-heat’ 
(e) Game between teams, e.g. ‘cup-tie’ 


3. ow (а) Front part of ship 
(b) To bend, a bending of the body, esp. in salutation 
(c) An instrument for shooting arrows, or anything shaped like 
it (e.g. legs, window) 
(d) A knot with loops, a looped knot of ribbons 
(е) To submit 


4. PORT (а) Wine 
(b) Harbour, haven 
(c) Side of ship, left-hand side 
(d) Aperture (window) in a ship 
(е) Method of carrying arms; bearing 


5. BOLT (а) A bar for fastening; to fasten 
(b) To swallow quickly or greedily 
(c) To rush away; escape, flee 
(d) Arrow; any missile, as in thunderbolt 
(е) Bale of cloth 


144 


6. SOUND 


7. FENCE 


APPENDIX 111 


(a) Noise; to make a noise 

(b) Bay, channel, part of sea 

(c) Healthy, reliable 

(d) To test, find the depth, inquire, question 


(a) Hedge, barrier; to enclose, build a barrier 
(b) To indulge in sword-play 

(c) Receiver of stolen goods 

(d) To evade the issue, prevaricate 


8. роск (a) Place where ships tie up; to tie up a ship 


9. SUIT 


10, KEY 


11. PRESS 


12, PLOT 


(b) Part of court where the accused stands 
(c) To cut short, to clip 
(d) A weed 


(a) Clothing 

(b) To be to one’s taste, be appropriate, etc. 
(c) An action at law 

(d) Courtship 

(e) Playing-cards of the same sort 


(a) Instrument for opening a door, or for winding, etc. 

(b) Solution, clue, guide, translation 

(c) Middle stone of an arch; most important (esp. of a position 
or person) 

(d) Part of a piano, organ, etc. 

(е) Arrangement of notes т а musical scale 


(a) To push with steady force, squeeze, embrace etc.; an instru- 
ment for squeezing by applying weight 

(b) Printing machine; publishing firm; newspapers (collectively) 

(c) Cupboard 

(d) Crowd of people 

(е) To go forward with great speed 

(f) To urge strongly, lay stress upon 


(a) Story of a play or novel 

(b) Conspiracy; to conspire 

(c) Piece of ground 

(d) To make a plan of; indicate position on а map or on geo- 
metrical co-ordinates etc. 


145 


1. ЕЛІК 


2. ТІЕ 


3. вом 


4. PORT 


5. вот 


6. SOUND 


7. FENCE 


51591 AONVNUOAUAd OILSONOVIG 


5 


8. роск 


9. SUIT 


10. KEY 


11. PRESS 


12. PLOT 


13. сир” 


14. STUD 


15. MAIN 


13, CLIP 


14. STUD 


15. МАТЧ 


APPENDIX ІП 


(a) To fasten, a fastener, brooch, etc. 

(b) To snip, cut off; what has been cut (e.g. fleece) 
(c) A sharp blow 

(d) Speed or pace 


(a) Fastener (e.g. collar stud) 

(b) Nail with large head 

(c) To adorn with knobs; set thickly 

(d) Place where horses are bred; collection of horses kept for 


breeding 


(a) Chief, most important 

(b) The sea 

(c) Might, strength 

(d) Large pipe for water, cable for electricity etc. 
(c) A match at (wrestling or) cock-fighting 


147 


APPENDIX IV 


TRANSLATED VERSIONS OF THE REASONING TEST 


_————————— 


Specially prepared versions of the Reasoning test were available at Pemberley 
for candidates whose native or preferred language was other than English. 
Some difficulty was experienced in preparing suitable alternatives to certain 
of the items in the English version, but on the whole a fairly close approxima- 
tion to the distribution of types of problem was maintained. Items can in 
general be classified into three groups: those predominantly verbal in 
character, those based on number relationships, and a somewhat mixed 
group based on position of letters in the alphabet, coding relationships, etc. 
These are designated ‘Letter’ in Table 43, which shows the distribution among 
these three groups in the final form of the test for all languages. 
The following are examples (not included in the test) of each type: 


year month foot 
10 8 6 4 
AZ BY CX 


Of the 40 items, 25 (10 verbal, 6 number, and 9 letter) were common to all 
versions, but there is no way, short of item analysis (now no longer possible), 
of determining whether ostensibly identical items presented the same degree 
of difficulty in different languages—or possibly to nationals of different 
countries. To take a single example, the Danish is (ice), required as the 


Table 43 REASONING TEST: CONTENT OF TRANS- 
LATED VERSIONS 

Language Verbal Number Letter 
English 22 6 12 
French 16 9 15 
Polish 17 10 13 
Dutch 16 9 15 
Danish 15 10 15 
German 15 9 16 
Hungarian 19 7 14 
Сгесһ 15 10 15 


APPENDIX IV 


response to a particular question, was provided for on the test paper by two 
dots (..) and the number of words which could ‘fit? must be considerably 
smaller than, say, in the case of the French glace. It will be noticed, too, that 
in no other language did it seem possible to invent a sufficient number of 
suitable purely verbal items commensurate with the notorious Anglo-Saxon 
fondness for word-play. 

Differences in mean and variance between the French and English versions 
are evident in the tables of equivalent scores (Table 15, p. 84). Table 44, below, 
further shows means, ranges, and standard deviations for all versions, to- 
gether with the mean score on T-M-K (the only test done by virtually the 
entire population), and mean equivalent scores оп Matrices (pooling all 
versions and assuming equivalence for these). Correlations with the other 
tests studied will be found in Tables 21 and 22 in Chapter 8. 


Table 44 COMPARISON OF PERFORMANCE ON TRANSLATED VERSIONS OF REASONING TEST 
Reasoning Mean on 

Languages n Mean Range S.D. T-M-K | Matrices 
English 56 19-1 7-34 6:80 475 ars 
French 367 19:8 0-35 7-62 49:0 303 
Polish 89 145 3-28 621 44 278 
Dutch 41 23.8 6-34 7:31 581 321 
Danish 27 23:2 8-32 733 569 30:9 
German 22 18:4 4-30 8:31 44:5 28:6 
Нипвагіап 18 242 3-32 7:35 530 323 
Сгесһ insufficient data 


It is difficult to draw clear-cut conclusions from the above table, but it 
would appear that the Dutch, Danish, and Hungarian language groups 
evidently constitute ‘superior’ samples, while the Polish and German groups 
represent lower levels of ability as measured by T-M-K and Matrices. The 
very much lower mean for Polish Reasoning, however, further suggests that 
this version of the test was disproportionately ‘difficult’. A comparison of 
the French and English groups seems to pose more complex problems. The 
lower French Matrices mean, associated with a slightly higher Reasoning 
mean, suggests that the Reasoning test is more in line with French ways of 
thought than is Matrices, particularly since the evidence of the table of 
equivalent scores (Table 15, рр. 84, 85) is that the French version of Reason- 
ing was, if anything, more ‘difficult’ than the English. 


149 


APPENDIX У 


ILLUSTRATIVE CASE MATERIAL 


DOMNUM UL 


The hope that the full Pemberley material would become available for 
analysis was unfortunately not realized, and only two detailed case studies 
suitable for publication are now extant. These were originally chosen almost 
at random by Semeonoff to serve as illustrative examples appended to a paper 
read at the Durham meeting of the British Psychological Society in April 
1946, when all documents were still available. (Eventually only one case— 
Case E herein—was reported, and that very briefly.) 

Four further cases, also previously reported in outline by Semeonoff (39), 
are described in much less elaborate form than the two just mentioned. Since 
little more than the test scores is available for these, the inferences drawn 
cannot, of course, be verified. 

The general arrangement of the two case studies described in detail is as 
follows: 

Raw and equivalent scores on all written and performance tests contribut- 
ing to the Intelligence Grading are quoted, followed by this ‘global’ rating, 
with comments. 

Detailed notes taken during the administration of the tests are reproduced 
or amplified. Some indication of performance on other psychological tests 
is added. 

Relevant information, including assessment gradings, obtained from other 
parts of the procedure follows. 

Finally, a brief concluding summary calls attention to points of diagnostic 
interest, particularly those which emerge from the patterns of test scores. 


Case Е. Belgian national (French-speaking) aet. 26 
Test Scores 


Raw Equivalent 
Matrices (45”) 47 30 
Reasoning (French) 25 34 
T-M-K 41 28* 
T-M-K (Repeat) 44 26 
Semeonoff-Vigotsky 60 31% 
Carl Hollow Square 85 34* 


150 


APPENDIX У 


The sum of three equivalent scores (including Mean of three performance 
tests marked *) is 95, yielding a Pemberley 1.6. of 6—a ‘high’ 6, suggesting 
a position just below the 75th percentile of the general population. Assessed 
on W.O.S.B. norms, this candidate would have been allocated an ОЈ.К. of 6 
(a ‘low’ 6), i.e. not better than ‘low average’ officer level. 


Performance tests 

(i) T-M-K. Scores indicate a performance below average for this population 
sample, and even somewhat below general population average; the repeat 
shows a falling-off in terms of equivalent score, suggesting inability to profit 
quickly by experience in an unfamiliar situation. 

On Boards A, B, and C performance was slow but fairly accurate, with little 
use of trial and error. For Board D the candidate collected six pieces at a 
time. He twice altered correct placements, and mentioned that he had not 
properly grasped the reduction of scale. 

On the Repeat he ‘collected’ four at a time from the start. He was obviously 
working under tension; he kept knocking the blocks against the edges of the 
frames, and tried to force the blocks into position. On Board D he was 
puzzled, as before. This series was definitely beyond him; perception seemed 
poor, and he was unable to recognize the correctness or otherwise of the 
patterns he made. He worked with a fair degree of persistence, adhering 


rather rigidly to what he described as ‘a system of fixed corners’. 


(ii) Semeonoff-Vigotsky. After a session lasting 42 minutes, during which he 
received 3 clues (the last being tantamount to a ‘forced’ clue, which he 
accepted unwillingly) no solution had been reached. As the candidate was 
due at a meal, the session was adjourned (a rare exception to normal practice). 
On his return he solved the problem after 3 more minutes. The examiner did 
not wish to suggest that collusion might have taken place; this may have 
been the case, although on the whole it is unlikely. | М 

The first solution, at 5 minutes, was based on shape, trapeziums being 
equated with triangles, and semicircles with circles [Ist clue: large thin blue 
trapezium]. ; 

The subject then produced two groups on the basis of height, and at 20 
minutes divided them into 'curved' and ‘straight’ [2nd clue: large thick white 
trapezium].* 


1 This should probably be regarded as an error in administration, since a large trapezium 
is already exposed. A square block from either ‘straight’ group бае Баа 
appropriate, although іп this case a colour previously exposed in а clue Sa woul 
had to be duplicated. As it was, the subject failed to profit from this “easy” clue. 


151 


DIAGNOSTIC PERFORMANCE TESTS 


‘From this point onwards the subject ‘did’ very little (i.e. refrained from 
manipulating the blocks), but verbalized freely, showing a sound logical 
approach, in that he saw that if one part of his grouping was at fault the other 
part must be wrong too. Deadlock ensued, and when it was suggested that 
he might tentatively ‘try out ideas’ (in order to gain a clue) even if he were 
not entirely satisfied with them, he became aggressively resistant [3rd clue, at 
37 minutes: small thin yellow triangle, from a grouping in which he reverted, 
under protest, to shape alone]. 

Following the resumption, as noted above, he again made two groups, by 
height, and quickly achieved the correct subdivision. The explanation was 
perfect. 


(iii) Carl Hollow Square. Unfortunately details of the component parts of the 
total score are lacking, but notes indicate that from Problem 7 onwards the 
subject seemed to be carefully economizing on moves. His approach was 
cautious and deliberate. He appeared to be more relaxed than during the 
other performance tests (perhaps because this one was administered by the 
psychiatrist, who, unlike the other examiner, was completely fluent in French); 
he maintained a good-natured attitude even when temporarily baffled by 
Problem 11. Throughout, he worked to a system, starting with the largest 
piece in each case, but he tended not to notice the ‘cut’ corners. He seemed 
to tire about the middle of the series. 


Other tests 
On the last model of the Meccano test the candidate broke down com- 
pletely. On ‘Observation’ tests (see Chapter 8, p. 99) he scored very poorly: 
reproduction of тар} was grossly distorted, and his grasp of the significance 
of this task was questionable. One question in another of the observation 
tests was totally misunderstood. 

Projective material is not now available, but a note to the effect that it was 
‘misleading’ survives. 


Board Members’ Reports 

1. President. An intelligent, educated type of considerable promise. Has 
high ideals and the courage of his convictions. Possesses definite powers of 
leadership and organization, and would certainly inspire confidence. 

His natural self-confidence and physical health are at present undermined 
as a result of his recent'experiences . . . He should definitely be given а period 
in which to recuperate .... 

1 An objectively assessed test calling for the reproduction, from memory, of a simple 
sketch-map previously exposed. 


152 


APPENDIX У 


Should ultimately be capable of a position of considerable responsibility. 
Grading: C4-. 
2. Senior М.Т.Ол Showed flashes of high intelligence and insight, but was 
most erratic. In the same way he showed many signs of natural ability to lead 
(good chairmanship, authoritative manner, excellent handling of others and 
thoughtfulness for others), but he was evidently not anxious to have to lead. 
He seems to have a strong though quiet personality, but he is in very bad 
condition mentally, nervous and occasionally losing his grip. This may 
account for his disappointing lack of planning ability. When normal he has 
clearly got great determination backed up by high ideals. He has a serious 
and earnest nature, and would certainly inspire confidence. 
He will be perfectly confident to act by himself after a rest, but it would be 
a pity to waste his leadership qualities. He would make a good organizer іп 
control of others, and be capable of excellent contact work. Grading: C4- 


3. Second M.T.O. Appears to have good potentialities and abilities, but 
has been a continual disappointment: he lacks life and forcefulness and 
therefore fails to dominate the group as one feels he should. He has produced 
intelligent ideas and has shown spasmodic initiative, but on the whole he 
seems to suffer from a mental lethargy. If he pulled himself together and 
made the necessary effort he would be a good leader. Grading: C— 

4. Psychiatrist. Only fragmentary notes were recorded by the psychiatrist, 
who described the candidate as *over-intellectualized ... ап autocrat , . . with 
a sense of tradition . . . strict in his religion. . . . Exaggerates his adultness, but 
shows youthful idealism in love and hero-worship. Variable in his self-control 
... rather reserved, inhibited, and solitary—but very fatherly to the others.’ 

Grading: C (predictively, B). 


On the basis of test results and incidental observation, the psychologist 
summed up his own impressions as follows: 

‘Intelligence on a low “В” level, considerably enhanced by natural leader- 
ship quality and personal charm. Logical to a fault, Somewhat aggressive in 
approach in some tests. Observation grading notably low, but this probably 
unimportant in any capacity he might be required to undertake,’ Grading: C +. 

It will be seen that all assessments, with the exception of the second 
M.T.O.’s (and, by implication, that based on the projective material) agree 
in rating this candidate as being of high average quality or better. There is 
good agreement both on his capacities and on his limitations. Little more in 
the way of comment is necessary, but from the point of view most immediately 

1 For a description of the functions of the M.T.O. (i.e. Military Testing Officer) see 
Morgan (30). è 

153 


DIAGNOSTIC PERFORMANCE TESTS 


hlevant in this context it may be noted that this is a case of hardly better 
tean mediocre test intelligence, further handicapped by rigidity, and occasion- 
ally yielding to stress. In ordinary life his social and cultural background 
gives him immense support, so that he functions to the full limit of or even 
beyond his capacity. 


Case Е. Naturalized British subject, English-speaking—(but see below) aet. 43 


Test scores 


Raw Equivalent 
Matrices (20’) 35 26 
Reasoning (English) 9 25 
T-M-K 18 20* 
T-M-K (Repeat) 32 21 
Semeonoff-Vigotsky 20 36* 
Trist-Hargreaves 72 35% 


The Pemberley I.G. is (а low) 4, based оп summed equivalent scores total- 
ling 81—probably not higher than the 33rd percentile of the general popula- 
tion. Note, however, the much higher level in the two concept-formation 
tests. Weighted equally with the written tests either of these would have raised 
the I.G. to 5, with a point or two to spare. On the other hand, T-M-K plus 
the written tests would have reduced the I.G. to 2. The same tests assessed on 
W.O.S.B. norms yield a sum of equivalent scores which is 8 points below 
the upper limit of O.I.R. 0. 


Performance Tests 

(i) Т-М-К. On Board В the candidate gave up the first and second problems 
almost at once, giggling self-consciously the while. In Series C he turned the 
whole board sideways, amd worked in that position. He reproduced the slope 
of the diagonal line in Item 2 with a movement of his hand, suggesting that 
he was making use of kinaesthetic imagery. On Board D he began with Item 
2; he was able to analyse the pattern, but not to carry the analysis into effect. 
Like ‘Case Е’, he had difficulty with the reduced scale of the diagram. 

In the Repeat, his attempt at Al was inverted. At the end of Series A he 
asked whether he was ‘doing better’. In Series B he gave up rather easily on 
the last block in item 1. 

He spoke a great deal throughout the test, asking about time-limits, and 
about the psychological implications of the relation between speed and age. 
His attitude was anxious and dependent. The performance in general was 

. careful and analytic, although not very successful. 


154 


APPENDIX У 


(ii) Semeonoff-Vigotsky. Throughout the test the subject verbalized freely, in 
Russian. He tried colours first, counted them, and announced '5—too many’. 
Shapes were similarly rejected, as were ‘heights’, on the grounds of ‘too 
few’. Next he weighed the pieces in his hand, after which he asked himself 
(without touching the pieces): ‘Width? ‘Area?’ ‘Volume?—big, medium, 
small’ ‘Number of sides ?—or points?” 

At 6 minutes he produced an imperfectly worked-out grouping based on 
‘number of corners’ [clue: large thin yellow triangle]. 

He now reverted to ‘Thickness’ and tried out a tentative sample using 
pieces which matched in colour as well as in thickness. He then asked himself 
«Сап we make sub-groups, with thickness used as a primary classification ?” 
Verbalizing as before, he dismissed the possibilities of colour and of angles 
as a secondary criterion. He then tried matching shapes, as between the two 
main groups. 

At this stage he remarked, “Very interesting.’ Thereafter he reasoned aloud, 
as follows: ‘A good classification would have each main group divided in 
the same way. Is it areas? Or volumes? Probably there are more than two... 
[at 15 minutes] . . . It's trivial classification: large and small'. 

Explanation was recapitulated correctly. 


(iii) Trist-Hargreaves. From the second *two-group' problem, the subject 
began to look for complications. For the third ‘three-group’ problem he 
tried ‘inside’ colours, but remarked that not all ‘had an inside’. He asked 
whether six-group problems would be posed, and many other questions re- 
garding what was required. 

The first Pairs problem was correctly analysed verbally. At Problem 4 he 
remarked that for ‘the second half . . . one can just pick out by rim.’ At 
Problem 5 he inquired, ‘The old story?’ At Problem 7 he ‘saw a hitch’. 
Problem 10 he analysed verbally, concentrating on integrity, but reached no 
conclusion, and had to be stopped. 

Throughout, he examined the pieces carefully, especially as regards texture; 
he volunteered the information that he had ‘done work in the sense of touch’. 
He also remarked, ‘This one [i.e. the test] is like dominoes but I don’t play 
dominoes.’ 

Other tests 

This candidate was one of a special ca 
aptitude tests. His performance on the 
indeed, 

Projective tests. The following ‘pointer’ was 
of Word Association, abbreviated Т.А.Т., and 
good friend, and by a severe critic). 

155 


tegory group who did not take the 
‘Observation’ tests was very poor 


based on the usual combination 
self description (as seen by a 


DIAGNOSTIC PERFORMANCE TESTS 


A peculiar man who describes himself as “‘too restless, aggressive, fanatical 
masochistic, sadistic—has no light touch”. 

He seems rather a detached schizoid type with very marked feelings of 
inferiority which һауе given rise to a compulsive drive to be master of every 
situation in which he finds himself. He is fearful of failure, and a seeker after 
absolute perfection, meticulous in the true sense of the word, fussy and rather 
obsessional about detail. 

His general attitude to life is a very markedly religious one—again perhaps 
an outcome of his inferiority feelings—and he is filled with a most uncommon 
love for and belief in human beings. He needs to trust others and really to 
rely upon them, but he seems at times to be a little disappointed and self- 
pitying. In spite of his pervasive and embracing humanitarianism he feels 
that the loss of works of art is greater than that of human life. He has a con- 
tempt for money and ambition and is perhaps prone to lay the blame for his 
failures on external conditions. There is much underlying aggressiveness in 
his make-up, and much accompanying submissiveness. His own insight into 
his personality is fairly good as far as it goes. He is a disordered type with 
possible schizophrenic tendencies. His intelligence is poor and it is difficult to 
see him being of any use.’ Grading: F. 

By way of supplement to the above, a sample of this candidate’s projective 
material is appended. The following is his response to Murray’s well-known 
*mother and son' picture (6 BM) (Murray, 33). It should be explained that 
the candidate did the projection tests on his own portable typewriter, and 
that the reproduction of spelling, division of lines, etc. is accurate. 


A yong engeener has been shoked by the broutalyty og germans in plannibg so 


well thechniqualy the domination of the word and the use of the best scientific 
means to carry out theri ideas He was brouding about the terrific suffering of 


peple in eaurope all the starvation and distruction of the finest peaces of arts 
and sujuction of sciences he makes up his mind to fight to the ourmost inthe 
field where his background and knowledge are of use . To meet fire with fire . 
he goes to his mother and is about to tell her that he going to join a dange 
rous work as a civilian he can not tee! her ДИА fit that he fill be an indust 
rial saboteur but he has to tell her fat he will use brutality and shade bollod. 
he is contemplating in the future the price of civilian lives in the act he 

will do it and be ruthless as the ennimy intself. But not to wonded and help 
les he сап not kill children and women ‚Мо he will control that. 


Board members’ reports 
1. President. A very unusual type of man with whom caution and meticu- 
Jous attention to detail are obsessions. He is dependent on others for all 


156 


| 
| 


APPENDIX У 


practical suggestions, has no natural resourcefulness and his ideas on propa- 
ganda and psychological warfare are most nebulous and inchoate. He is 
rigid and inelastic and finds the utmost difficulty in adapting himself to novel 
situations. 

He is earnest, sincere, and highly conscientious, and at times his human 
contacts can be sympathetic and even inspiring, but he will flounder and dither 
in the presence of any practical situation. 

He cannot be recommended for any sort of work . . . despite his consider- 
able knowledge of languages. It should be added that he has picked up a 
good deal of knowledge about [confidential matters] and is anything but 
discreet in his disclosures. Grading: F. 

2. M.T.O. Of the 12 M.T.O. tests, one was definitely above average, 6 were 
of average grades, and 5 were failures, but his case warrants special analysis 
because of the qualitative characteristics of his behaviour, which are so 
unusual. ... 

(1) Physically he lacks stamina, agility, and co-ordination. It is doubtful 
if he could complete the Physical Training programme in training. Any work 
requiring physical exertion is out of the question. 

(2) He continually apologizes and makes excuses for his failures. 

(3) He is terribly preoccupied with details. In the ‘Emergency Situation’ 
instead of looking into the bureau for the secret message, which was the 
obvious place, he spent much time in the beginning in carefully examining 
the legs of the chairs. 

(4) He is dependent on others for help, guidance, and ideas. In his 'Leader- 
ship Situation’ he turned the problem over to another. He lacks self-confi- 
dence and resourcefulness. 

(5) He is almost always serious, even pessimistic. He rarely smilesorlaughs. 

(6) He always insisted that the instructions be repeated. "They are not 
quite clear’ was his invariable comment. He is careful and cautious in the 


extreme. Ы 
ffer to ‘sacrifice his life’ on group tasks. 


(7) Not infrequently he would o fe" on gro 
(8) Time is an obsession to him—‘Do I have enough time?’ or ‘How much 


time is left?" / 
(9) He is rigid and inflexible in his behaviour. 
The picture presented in the foregoing comments is not an exaggeration. 
то [him] even the simplest situation is а problem of baffling complexity. He 
is definitely not suited for any work in the field in any circumstances. 
But three other characteristics might be mentioned which may prove useful 


in getting him other work: 
(a) He seems to be interes 
(b) He has an analytical mind 


ted in, and knows a lot about, foreign countries. 
on problems involving human behaviour. 


157 


DIAGNOSTIC PERFORMANCE TESTS 


(c) In his ‘Approach Test’ he performed splendidly. His job was to inter- 
view a recruit. In this person-to-person situation he was friendly, cordial, 
and systematic. He expressed himself excellently and spoke with feeling as 
well as with dignity. He was calm and confident and had plenty of common 
sense. He was so reassuring as to be almost inspiring. Grading: F. 

3. Psychiatrist. A man who appears to be very much better than some of 
his test results show. He is highly idealistic, warm, sympathetic, with great 
feeling for the underdog. Tolerant and generous, he is especially interested in 
psychological warfare. He has imagination and considerable grasp of many 
problems, but is vague, unpractical, and very much the dreamer. Honest, 
sincere, and very genuine, he is a bit melancholic, and would not be easily 
placed. Grading: F. 

The psychologist’s impressions, from all sources (excluding projective 
material) were: 

‘А most peculiar man in the early forties, whose test results are most in- 
consistent. He is an extremely poor writer, but has an excellent logical mind, 
and analyses a problem verbally with great confidence. On the other hand his 
written work is practically illiterate, with a degree of confabulation verging 
on the schizophrenic. In outside work he was ineffective, and though he had 
some ideas no one seemed to want to listen to him. A bit of a showman, and 
not too anxious to undertake anything difficult. 

*Ranks fourth for leadership! (including his own rank of first), and rather 
lower in the other counts." Grading: F. 

Once again it is probably best to allow the reader to work out the detailed 
diagnostic indications for himself, in the light of the board members’ and 
other reports. Within the more circumscribed field of cognitive test results 
proper, the most striking feature is the high level of performance in the con- 
ceptual tests, and the margin of equivalent score points (15, or 3 S.D. units) 
separating these from the T-M-K performance. Any suggestion of language 
difficulty arising from the low Reasoning score and the poor quality of the 
written material may be dismissed since the Matrices score is almost equival- 
ent to the Reasoning score and the T-M-K is lower still. It may be noted, 
however, that assessment as а W.O.S.B. candidate on norms uncorrected for 
age (see Appendix II) would have penalized his Matrices equivalent score by 
no less than 16 points. There are pathological elements in all aspects of this 
candidate's performance in every part of the assessment procedure, but they 
point in a diversity of directions—some even to the possibility of brain 
damage. Yet his conceptual thinking is most conspicuously unimpaired. 

1 This refers to ‘mutual evaluation’ a form of sociometric assessment in which candidates 


were asked to rank members of the group for leadership, ‘lone wolf’ role, and other qualities. 
“Voting for oneself” was not barred. 


158 


APPENDIX У 


What emerges most clearly is the inadequacy in this case of a ‘global’ estimate 
of intelligence. That an individual of 1.0. somewhere around 90 should be 
capable of the subject’s powers of analysis, or even of being taught to think 
in that way, is hardly conceivable. 

The general pattern of the above two cases is one of consistency in findings 
as between contrasted approaches to the assessment of the individual, 
associated with discrepant yet not wholly irreconcilable test results. The four 
following cases further illustrate this latter point. They represent the most 
extreme cases encountered in a study (Semeonoff, 39) of the distributions 
of test scores carried out with the intention of discovering to what extent 
alternative batteries selected from the total range available at Pemberley could 
be expected to yield identical intelligence gradings (see Appendix LD.A 
brief summary of the quantitative findings follows the case descriptions. 


Case G 
Test scores 
Raw Equivalent 
Matrices (45^) 42 26 
Reasoning (French) 15 28 
T-M-K 63 36* 
T-M-K (Repeat) 57 31 
Semeonoff-Vigotsky 15 38* 
Carl Hollow Square 99 39* 


The intelligence grading, using the standard combination of scores, is 6. 
On the written tests only, correcting to an S.E.S.3 (see p. 86), the grading 
would be 4, whereas a grading on the basis of the (starred) performance tests 
only would be a clear 9. ў 

The noteworthy features are thus: (1) all performance tests, whether spatial 
or conceptual, on a much higher level (about 2 S.D. units) than the written 
tests; (2) the marked drop on T-M-K repeat—raw as well as equivalent score. 

In (һе Semeonoff-Vigotsky test, after а preliminary grouping by colour 
(which he did not complete) the subject got on to size; while obviously making 
towards the correct solution, he asked about the number of pieces in each 
group. He appeared to construct the various groups empirically rather than 
to carry out a principle, and the first. solution offered (at 8 minutes) contained 
only one error (the large thick white trapezium included with the ‘small thick" 
pieces). This ‘clue’ was given, and two minutes later he had corrected and 
checked his grouping. The explanation was ‘pieces about the same size and the 
same height’. Questioning revealed that ‘about’ merely meant that he was not 
sure that the areas were mathematically equai. 

159 


DIAGNOSTIC PERFORMANCE TESTS 


In contrast to the high performance test scores, Meccano was not out- 
standing, though above average (grade 2 on the six-point scale). The three 
‘Observation’ tests were on about the same level. 

Examination of the pattern suggests that this subject did best when working 
against time (cf. lower score on the virtually untimed Matrices, as against 
Reasoning; fall on T-M-K repeat; Meccano is also a test less influenced by the 
time element than either T-M-K or Carl Hollow Square). The nature of the 
Semeonoff-Vigotsky explanation also suggests а ‘perceptual’ approach. Con- 
trary to the common interpretation of high performance as against low 
written test scores, there is no suggestion of a deficiency on the v:ed side. 


Case H 


Test scores 
Raw Equivalent 
Matrices (45’) 56 36 
Reasoning (French) 22 32 
T-M-K 38 27% 
T-M-K (Repeat) 43 25 
Semeonoff-Vigotsky 135 20* 
Carl Hollow Square 60 25" 


Intelligence grading, 6; on written tests, 8; on performance tests, 2. This 
pattern is a clear inverse of that of Case C, just discussed ; it is also one of two 
showing the highest all-over discrepancy encountered between written and 
performance tests. All performance tests аге оп a lower level than the 
written tests, and the Semeonoff-Vigotsky, which might have been expected 
to be more in line with the written tests, is lowest of all. 

Performance on the Semeonoff-Vigotsky was uneven: the significance of 
height was realized after about 14 minutes, following a single clue given after 
a loosely shape-determined solution. There was much mildly obsessional 
ranging of the blocks in straight lines, etc., and careful matching of lengths 
of sides, From about 30 minutes onwards groups which had apparently been 
carelessly completed were made, height being adequately dealt with, and size 
very imperfectly apprehended. Nine clues were required, and the solution 
reached at 60 minutes. Only a very vague explanation in terms of ‘size’ was 
offered. 

Meccano was below average, and ‘Observation’ (except for map-reproduc- 
tion) very low. 

There were indications of inferiority feeling, and it seems that written test 
situations temporarily freed the subject from this disability. There is also а 
possibility that this subject (a Belgian) might have been more at home in 
Flemish, and that he may have elected to do the tests in French on the 


160 


APPENDIX У 


supposition that it would be more acceptable to the Board, as well as to his 
fellow-candidates. This might account for the relatively low Reasoning score, 
and the more markedly low scores on other tests involving the use of language 
(including vis-à-vis the tester)—even to the pattern of his ‘Observation’ 
scores. The poor Semeonoff-Vigotsky performance, however, cannot be 
wholly explained on this assumption. 


Case J 
Test scores 
Raw Equivalent 
Matrices (20^) 39 29 
Reasoning (French) И 22 
T-M-K 56 33* 
T-M-K (Repeat) 60 32 
Semeonoff-Vigotsky 2 39* 
Trist-Hargreaves 54 29 
Carl Hollow Square 99 39* 


Intelligence grading, 5; оп written tests, 3; on performance tests (as 
starred), 9. The second case of a discrepancy of 6 1.9. points between the 
written and performance test gradings. Note, however, that the inclusion of 
Trist-Hargreaves (at this point undergoing re-standardization) in place of 
Semeonofi-Vigotsky would reduce the I.G. based on performance tests to à 
low 8. 

The Semeonoff-Vigotsky solution was reached in 2 minutes, without clues 
— practically a ‘ceiling’ performance. No attention at all was paid to colour, 
and it is noteworthy that the subject failed on both the Trist-Hargreaves 
three-group problems which involve edge colour. He also showed some 
anxiety lest he should exceed the (of course non-existent) time-limit in the 
Trist-Hargreaves Pairs problems. 

Meccano was grade 2; ‘Observation’, grade 4. 

The most obvious source of the pattern would appear to be limited edu- 


cational opportunity. 


Case K 
Test scores 
Raw Equivalent 
Matrices (20) 32 24 
Reasoning (Polish) 9 26 
° TMK 58 34* 
T-M-K (Repeat) 65 34* 
30* 


Carl Hollow Square 74 
161 


DIAGNOSTIC PERFORMANCE TESTS 


This case is included partly because it presents certain rather unusual 
problems of assessment. The equivalent score for Reasoning was originally 
quoted as 25, but accumulation of fresh data and re-calculation suggest that 
26 is a truer estimate. Substituting the higher figure has the effect of raising 
the inferred ‘written’ I.G. from 2 to 3; the all-over (standard) I.G. is 4 in 
either case, and the ‘performance’ I.G. 7, but it should be noted that gradings 
involving performance test scores are obtained by counting T-M-K Repeat 
as a ‘separate test’, as was customary when only two performance tests had 
been administered. 

This candidate was one of a group with a uniformly poor educational 
background, but the pattern of his scores differs from that of Case J in that 
Matrices, and not Reasoning, represents his low point. None of his perform- 
ance test scores reaches a particularly high level, although T-M-K is well 
above average, and this level is maintained on repeat. The very low Matrices 
score would probably repay investigation; the impression is of a stable per- 
sonality of at least average intelligence, but slightly diffident and ‘slow to 
start’. Meccano was grade 2, but the two ‘Observation’ tests were in the two 
lowest grades. 


162 


REFERENCES 


1. ALEXANDER, W. Р. ‘A new Performance test of intelligence.’ 


2: 


. FERGUSON, G. O., 


. GOLDSTEIN, 


Brit. J. Psychol., Vol, 23, pp. 52-66, 1932. 


BECK, S. J. Rorschach’s Test. New York: Grune & Stratton. Vol. 
I, 1944; Vol. II, 1945. 


. BELL, J. в. Projective Techniques. New York and London: 


Longmans, 1948. 


. BENDER, LAURETTA, ‘A visual motor Gestalt test and its 


clinical use.’ Res. Monogr. Amer. Psychiatr. Ass., No. 3, 1938. 


. BUROS, о. К. (ed.) The Nineteen-Forty Mental Measurements 


Yearbook. Highland Park, N.J., 1941. 


. BUROS, о. к. (ed.) The Third Mental Measurements Yearbook. 


New Brunswick: Rutgers University Press, 1949. 


. CARL, G. P. ‘A new Performance test for adults and older 


children: the Сай Hollow Square. J. Psychol, Vol. 7, 
pp. 179-99, 1939. 


. COLLINS, MARY and DREVER, J. A First Laboratory Guide in 


Psychology (2nd ed.). London: Methuen, 1934. 


jr. ‘А series of Form boards. J. exper. 
Psychol., Vol. 3, pp. 47-58, 1920. 


. FOSBERG, I. А. ‘A modification of the Vigotsky Block-test for 


the study of the Higher thought processes.’ Amer. J. Psychol. 
Vol. 61, pp. 558-61, 1948. 


‚ GOLDSTEIN, к. ‘The significance of psychological research in 


schizophrenia.’ J. пегу. ment. Dis., Vol. 97, рр. 261-79, 1943. 


: 
к. and SCHEERER, M. ‘Abstract and Concrete be- 
haviour: an experimental study with special tests.’ Psychol. 
Monogr., Vol. 53, No. 2 (whole по. 239), 1941. 


. GUILFORD, J. P. Fundamental Statistics т Psychology and 


Education. 2nd edition. New York and London: McGraw- 


Hill, 1950. 
163 


+ 


DIAGNOSTIC PERFORMANCE TESTS 


14, HALSTEAD, H: and SLATER, Р. “Ап experiment in the voca- 
tional adjustment of neurotic patients,’ J. ment. Sci., Vol. 92, 
pp. 509-15, 1946. 


15. HANFMANN, EUGENIA, and KASANIN, J. “А method for the 
study of Concept formation.’ J. Psychol., Vol. 3, рр. 521-40, 
1937. 


16. HANFMANN, EUGENIA, and KASANIN, J. Conceptual Think- 
ing іп Schizophrenia. New York: Nervous and Mental 
Disease Monographs (No. 67), 1942. 


17. HARRIS, н. The Group Approach to Leadership Testing. London: 
Routledge and Kegan Paul, 1949. 


18. KENT, GRACE и, Review (273) іл 6, pp. 352-5. 


19. kLOPFER, в., et al. Developments in the Rorschach Technique: 
Vol. I: Technique and Theory. London: Harrap, 1954. 


20. Kons, s. с. Intelligence Measurement: а Psychological and 
Statistical Study based upon the Block-design Tests, New York: 
Macmillan, 1923. 


21. LADD, в, in DRUNK, L, U. ‘Stability of personal approach in 
two tests of concept formation.’ Unpublished М.А. thesis, 
University of Kansas, 1954. 


22. LAIRD, A. J. ‘Some quantitative and qualitative aspects of 
testing.’ Unpublished degree thesis, University of 
à Edinburgh, 1952. 


Е 


23. LINCOLN, в. A. "Тһе reliability of the Lincoln Hollow Square 
form board and a comparison of Hollow Square scores with 
s pr mental ages.” J. appl. Psychol., Vol. 15, рр. 

‚ 1931. 


24. LOVELL, к. ‘A study of intellectual deterioration in adolescents 
and young adults.’ Brit. J. Psychol., Vol. 46, рр. 199-210, 1955. 


25. LOWENPELD, V. The Nature of Creative Activity. London: 
Routledge & Kegan Paul, 1939. 


26. MCCALL, W. A. Measurement. New York and London: Mac- 
millan, 1939. 


164 


27. MACNEILL, FLORENCE E "An —! 
its relation to intelligence and special qd Ыт) 
Ph.D. thesis, University of Edinburgh, 1942, 


28, MINISTRY OF INFORMATION, with WAK OFFICE. Personnel 
—— the British Army, 1944—O fficers (film). London: 


29. MONS, W. Principles and Practice of the Rorschach Personality 
Test, London: Faber, 1947, 


30. MOROAN, W. J. Spies and Saboteurs. London: Gollancz, 1955. 


‚ MORRIS, в, 5. "Officer selection in the British Army, 1942-1945," 
Occup. Psychol., Vol. 23, pp. 219-34, 1949. 


32. MURRAY, И, А. et al. “Explorations in Personality, New York: 
Oxford University Press, 1938, 


33. MURRAY, и. A. Thematic Apperception Test Mansal. Cambridge, 
Mass.: Harvard University Press, 1943. 


34, RAPAPORT, D. Manual of Diagnostic Psychological Testing. 
Chicago: Year Book 1946. 


w 


35, RAVEN, J, C. "The R.ECI, series of of Perceptual tests: an experi- 
mental survey.” * Brit. Jamed. Paychol,, Vol. 18, pp. 16-34, 1999, 

Е 
4 "Standardization Matriges, 193." 
uir ary mel, Prychel., Vo. 19: pp. 197-50, 194). p 


37, RAVEN, 3, с. Guide to using Progressive Matrices (1947), Sen 
А, А», В. London: Lewis; Нагтар, 1947-51. 


38. RAVEN, 7. €. aad ALA з. етой Төз. в“. J, 
med, Psychol, Vol. ol. 20, pp. 185-94, 1944. 
‚ SEMEONOFF, в. "The "The intetchangeability of tests in an Intelligence 
= Paper read at a meeting of the Experimental Puy 
chology Group, 1951. 
*Projective and and other predictors of academic 
Бау Du. Ir ‚ psychol. Sot., No. 26, р. 51, 1955. 
165 


DIAGNOSTIC FERFORMANCE TESTS 


41. SEMEONOFF, В. and LAIRD, A. J. “Тһе Vigotsky test as a measure 
of intelligence.’ Brit. J. Psychol., Vol. 43, рр. 94-102, 1952. 


42, SHIPLEY, W. C. ‘A Self-administering scale for measuring in- 
tellectual impairment and deterioration.’ J. Psychol., Vol. 9, 
pp. 371-7, 1940. 


43. TRIST, в. L. ‘Short tests of low-grade intelligence. Ш.’ Occup. 
Psychol. Vol. 15, pp. 120-8, 1941. 


44. TRIST, E. L. and TRIST, у. in ‘Discussion on the quality of mental 
test performance in intellectual deterioration.’ Proc. Roy. Soc. 
Med., Vol. 36, pp. 243-9, 1943. 


45. VERNON, Р. E. Psychological tests in the Royal Navy, Army and 
A.T.S.’ Occup. Psychol., Vol. 21, pp. 53-14, 1947. 


46. VERNON, P. E. and PARRY, J. B. Personnel Selection in the 
British Forces. London: London University Press, 1949. 


47. WECHSLER, D. The Measurement of Adult Intelligence. Baltimore: 
Williams & Wilkins, 1944. 


48. WECHSLER, D. Manual for the Wechsler Adult Intelligence Scale. 
New York: Psychological Corporation, 1955. 


49. WEIGL, в. ‘On the psychology of so-called processes of abstrac- 
tion’ (trs. Margaret J. Rioch). Jeabn, soc. Psychol., Vol. 36, 
pp. 3-33, 1941. ; 

" 

50. wYATT, E. ‘The scoring and analysis of the Thematic Apper- 

ception Test.’ J. Psychol., Vol. 24, pp. 319-30, 1947. 


166 


INDEX 
(Principal entries are set in heavy type.) 
Abbreviations: CHS: Carl Hollow Square; ОТВ: Officer Intelligence Rating; 
P: Passalong; S-V: Бешеопо Vige tsky; Т-Н: Trist-Hargreaves; 
| T-M-K: Trist-Misselbrook-Kohs; W.O.S.B.: War Office Selection | 


Board. 

“А” battery (Garston), 103, 104, 105 ‘Bell, J. E., 53, 163 
Abstract attitude, 41, 50 Bender, L. A., 53, 163 

‘reasoning, 123 Bender-Gestalt test, 53 
вк (Naval Group Test 1), bir factors, 108, 110, 111, 112, 

7 

Abstraction tests, 6, 95 Bizarre responses, 25 
Ach, N., 8 Black and white ‘Kohs’, 49, 54 
Adequacy of verbalization, 27 Blocking, 12, 38, 40, 151 
Aesthetic sensibility, 38 British Peycbologic! Society, 112, 


Age correction, 97, 123 


Age, effects of, 154 Buildi ith unl A 
£ A ‘Building’, with сопсер 
Age-group norms, 83, 127 30 um ТА 19, 36 


Age norms, 94 
Aggressivity, 152 Buros, 0. K., 163 ч 
Alexander, W. P., 6, 70, 75, 77, 78, 163 sa aed Classification Test, 


Ambiguous Shapes (test), 6, 27 


Brunk, L. U., 164 


Canadian Figure Analogies, 96, 97 


Amended O.LR., 119 Carl P. G., 55, 58, 60, 61, 62, 63, 64, 
Analogies, 96 ü 66, 163 
А , i 
Aate test approach, 46, = 61, Cari; Hollow 5 var Test, à 4%. 
SIT, 80, 81, , , , 
Anomalies, in factors, 110 106, 109, 116, 11Î, 114, 160 
in O.LR., p accidental сайын 62 
іп norms, orms, 94, 
in performance, 104, 119 basic score, 59 
Anstey, E., 99 bevelled edges, 55, 56, 62, 65, 66 
Anxiety, 21, 38, 49, 50, 58, 66, 132, bonus score, 60, 68 ` 
135, 137, 142, 154, chance success, 61 
Army record, 130, 133 check on totals, 60 
AE erc in, 28, 30 com ion with PS n 72, 74, 
ttitudes, 49, ‚ 77, 78, 100, 109, 
Average private, 91 with T-H, 65 
Ver ving officer, 91, 131 with T-M-K, 49, 51, 66 


correlations between score com- 


qe Garston), 103, 104, 105 
B’ battery (Garston), e ponents, 54, 67 


Beck, S. J., 51, 163 
167 


INDEX 


Carl Hollow Square Test, 
counting moves, 58, 59, 61-2 
demonstration, 58 
description of performance, 152 
failure, 59, 60 
feeling edges, 66 
general population norms, 61 
graphic representation of scores, 


handling of pieces, 66 

hollow square, 55, 56, 57 

layout of materials, 56-7 

learning, 63-4 

orientation, 58, 65 

outside square, construction of 
solution, 66 

penalty score, 60, 68 

personality indications, 63 

points for further investigation, 67 

problems, 3, 58, 63; 5, 58, 63; 11, 
65, 152 

progression in difficulty, 58, 59 

rate correction, 59, 60-1, 67 

running commentary, 57 

schedule of presentations, 56, 68 

scoring sheet, 57, 60 

sequence of problems, 63-4, 67 

shortening test, possibility of, 102 

speed, 58 

squareness of ‘hole’, 58, 65 

L5 SUR if of scoring method, 


0 

‘standard’ rate of working, 60-1 

tally of moves, 57, 59, 60 

time limits, 58, 59, 60, 67, 68 
Case studies, 126ff, 1508 

Case A 131 

Case B, 132 

Case C, 132 

Case D, 133 

Case E, 21, 150ff, 154 

Case F, 154ff 

Case G, 159, 160 

Case H, 160 

Case J, 161, 162 


Case Studies, 
Case K, 162 

‘Ceiling’ scores, 86 

Cerebral lesion, 4, 158 

Chambers’ Twentieth Century Dic- 
tionary, 125 

Chance solution, 16, 22 

Children, 55, 61, 70, 94, 97 

Clinical interview, 6 

Сс (test), 94, 97, 99, 108, 110, 

Collins, Mary, 94, 163 

Collusion, 22, 151 

Colour relationships, 25, 36, 37 

Colour shock, 28, 49, 54 

Compulsive thinking, 20, 65 

Concept formation, tests Of, 6, 49, 
107, 108, 110, 154, 159 

Conceptual factor, 110, 111, 142 

Conceptual shift, 28, 41 

Conceptual thinking, 143, 158 

‘Concrete’ intelligence, 5 

Confirmatory tests (W.O.S.B.s), 83, 
95, 100, 113, 119 

Сооквоп, Тоусе, 112 

Correction for restriction of range, 
102, 108 

Correlations, 98ff, 108, 113, 149, see 
also separate Test entries 

Criteria sheets, 121, 143 

‘Criterion’ tests, 98, 106, 110 

‘Crystallized’ intelligence, 123 

Cultural influences, 5, 21, 120, 136, 

Cyclostyled Instructions, 35, 56 


Defensive attitude, 50 

Definition by example, 125 

Definitions, 120, see also Mill Hill 
Vocabulary 

Demonstration, 42, 47, 58, 72, 76 

Depressive trend, 22, 37 

Depth interpretation, 51 


168 


Detachment, 112 

Diagnostic indications, 120, 150 
testing, 2 

Dictionary, 125, 136 

раи progression іп, 58, 59, 77, 


Diffidence, 162 

Discrepancies in Intelligence Grad- 
ing, 114ff, 1597 

Disorientation, 46 

Disposal gradings, 150 

Dissociation, 50, 52 

Moe ш sorting by, 16, 19, 23,. 

Disturbed subjects, 126, 131 

Drever, J. (sen.), 94, 163 

D.S.P. (Directorate for the Selection 
of Personnel), 6, 81, 99 


Earl, С. J. C., 6 

Educational factor, 111 

Educational standard, 91, 119, 120, 
130, 131, 132, 134, 136, 162 

Ego-defence, 22, 66 

Elliott, Elsie, 39 

parallel form of T-H, 39, 40 

Emotional reactions, 21, 38 

Encouragement, 48, 59, 72 

End-spurt, 95 

Enforced help, reaction to, 76 

Engineering, 66 

Equivalent scores, see Standard 
Equivalent Scores 

Extrapolated scores, 86 

Extrapunitiveness, 50, 76 

Factor loadings, 106-7, 111, 112, 141 

Factorial studies, 5, 27, 55, 98, 102, 
103, 106ff, 119 

Failure, 19, 59, 60, 72-3 

dus (nose (Pemberley), 100, 

Ferguson, С. О. (/ғ.), 163 

Ferguson Form-boards, 55 


INDEX 


Film of officer selection, 7, 165 

Fisher, R. A., 106, 4 
Fluency, 126, 130, 133 

Fluidity, 63 

porn sie tests, 5, 6, 55, 70, 107, 


НУ (sorting test: Weigl), 6, 


Fosberg, I. A., 18, 19, 163 
Frustration, 53, 58 
Functioning beyond capacity, 154 


g (factor), 95, 96, 107, 110, 123, 142, 

Garston (No. 14 W.O.S.B.), 5, 55, 
67, 81, 82, 91, 96, 98, 99, 100, 
103, 105, 109, 110, 119 

General population, 106 

norms, or standards, 61, 80, 81, 82, 

91, 97, 121, 151, 154 

‘General intake’ samples (Army), 123 

Gestalt qualities, 46, 

erar K., 4, 28, 41, 46, 50, 52, 
1 


G. P. I. patients, 27 

Grade of solution, 13, 14ff, 23 
Grouping (unpublished test), 141 
Group testing, 4, 39 

Guessing, 126, 132 

Guildford, J. P., 102, 163 


Halstead, H., 66, 164 

Handling of blocks, 50, 100 

Handwriting, 131 

Hanfmann, Eugenia, 8, 13-14, 20, 
164 


Hanfmann-Kasanin test, 6, 8, 13, 22 
specifications, 24 

Haptic type, 66 

Hargreaves, R. G., 28 

Harris, H., 7, 164 

Horizontal-vertical illusion, 65 

Hypomania, 131 

Hysteria, 132 


169 


INDEX 


Illness, 131, 134 
Impatience, 57 
Incentives, 21 
Inferiority feeling, 160 
Information, tests of, 100, 120, 123 
Inhibition, 51, 53 
Inquiry, 22, 28 
Insecurity, 22, 38, 65 
Instability, 37, 52, 76 
Intelligence Grading (I.G.: Pember- 
ley), 6, 78, 86, 112ff, 150, 151, 
154, 159, 160, 161, 162 
Intelligence quotient, 61, 66, 94, 97, 
121, 159 
Interchangeability of tests, 98, 112ff 
Inter-Services Research Bureau, 1 
Intuitive approach, 66 
‚ Irregularity, 20, 36 
Item analysis, 139, 148 


Kasanin, J., 8, 13-14, 20, 164 

Kent, Grace H., 57, 61, 164 
Kinaesthetic imagery, 154 

Klopfer, B., 164 

k:m (factor), 107, 109, 142 

Kohs' Block Designs Test, 4, 6, 41, 46 
Kohs blocks (material), 42, 53 
Kohs, S. C., 164 


Ladd Block test, 23, 28 

Ladd, F., 164 

Laird, A. J., 9, 14, 23, 81, 82, 164, 166 
Language difficulty, 158 
Laterality, 65 

Learning, 4, 41, 51, 63-4, 76-7 
Learning theory, 77 

Letter-digit substitution, 94 
‘Letter’ problems (Reasoning), 148 
Level of aspiration, 20 

Limits testing, 4, 34 

Lincoln, E. A., 55, 164 

Lincoln Hollow Square (test), 55 


Loosening, of conceptual span, 37 
Lovell, K., 39, 164 

Lowenfeld, V., 66, 164 

Low-grade intelligence, 28 


McCall, W. A., 91, 164 

Macdonald, Elsie, see Elliott 

MacNeill, Florence E., 6, 165 

Map reproduction (test), 152, 160 

Matrices, Raven's, see Progressive 
Matrices 

Matrix-type problem, 28 

Mean weighted correlations, 106 

Meccano test (Pemberley), 99, 101, 
108, 152, 160, 162 

Mechanical aptitude, 66, 99 

Mechanical counter, 57 

‘Mec’ test (S. P. Test 24), 99 

Mental age, 94, 97 

Menia measurement Yearbooks, 55, 

3 


Mill Hill Emergency Hospital, 1, 3, 4 
27, 41 


Mill Hill Vocabulary Test, 97, 120, 
123, 124ff, 136 
compared with Wechsler Vocabu- 
lary, 120, 123 
complex responses, 125 
definitions, 123, 125, 126, 129, 131» 
132, 133 
deliberate low scoring, 126 
partial credits, 123, 125 
scoring key, 125 
speed, 125 
synonyms, 125, 126, 129, 131, 132 
time-limit, 125 
Misselbrook, B. D., 6, 41, 49 
Misunderstanding, 152 
Mitchell, A., 104, 135, 141 
Mitchell Vocabulary Test, 97, 110, 
135ff 


anxiety, 137, 142 
correlations, 140 


170 


Mitchell Vocabulary Test, 
criteria, 143ff 
difficulty, progression in, 136 
earlier drafts, 137-40 
factor analysis, 141-2 
‘reason’ for test, 135-6 
split-half reliability, 140 
time-limits, 137 
Mixing pieces, 30, 51 
Mons, W., 52, 165 
Morgan, W. J., 1, 153, 165 
Morris, B. S., 7, 165 
Morse Aptitude (S. P. Test 10), 99 
Motivitation, 98, 124, 130 
M.T.O. (Military Testing Officer) 
reports (Pemberley), 1 52-3,157-8 
Murray, H. А., 41, 153, 156, 165 
Mutual evaluation, 158 


Narrowing, of conceptual span, 37 

Neurotic subjects, 27, 28, 66 

N.LLP. Group Test 33, 95, 97, 141 

Noegenetic principles, 6 

Non-verbal intelligence, 130, 131 

test material, 135 

Norms, 80, 96, (see also Age-group, 
Age, General population, Officer 
population, Pemberley Popula- 
tion, Percentile, Standard Equiv- 
alent Scores, T-scores) 

Number problems (Reasoning), 148 

Number series, 95 


Object sorting (test), 37, 141 
Observation Tests (Pemberley), 99, 
152, 155, 160, 161, 162, 
Obsessive-compulsive syndrome, 20, 
66, 139, 160 
Occupational record, 119, 130 
Occupational requirements, 47 
Officer Intelligence Rating (O.LR.), 
67, 78, 86, 91, 119, 125, 131, 132, 
151, Ee ) 2 4t 
Officer ation) norms, 80, 81, 
$1. 6596, 97, 104, 123, 126 


INDEX 
Open-ended problems, 46, 95 
Oppositional traits, or trend, 17, 22, 
Organic lesion, 28, 51, 131 
Organizational ability, 51 
Over-confidence, 38 


Parallel forms of tests, 23, 39-40 
Раи tendency, or traits, 17, 20, 


Parry, J. B., 99, 123, 166 

Partial credits, 121, 125 

тамдар ie 3, TOF, 81, 96, 108, 
basic score, 74 
bonus score, 74, 78-9 
boxes, 70, 71 


шры, with CHS, 6, 70, 72, 74, + 
76, 


77, 78, 100, 109, 112 
Я with original version, 
counting moves, 72, 75 
demonstration, 72, 76 
difficulty, progression in, 77 
end-positions, 70, 71, 72, 73 
enforced help, 76 
failure, 72 
*habit-breaker" (discarded item), 77 
learning, 76 
penalties (deductions), 72, 74 
penalty score, 74, 78-9 
points for further investigation, 78 
practice item, 74 
Problems, 3, 71-2, 6-9, 75, 7-9, 

71, 7, 71, 8,71, 75, 77, 9, 75, 10 
(additional), 71, 75, T 
rate correction, 74, 78, 79 
replacement at starting position, 72 
rotation, 76, 77 
sequence of problems, 76 # 
setting-up problems, 71-2 
specifications, 71 
speed, 76 
standardization of scoring, 74 


171 


INDEX 
Passalong Test, 
sub-goals, 70 
tally of moves, 72 
time-limits, 72, 74, 78 
Pattern completion tests, 135 
Pattern construction tests, 5 
Pemberley (Assessment board), 1, 5, 
‚ 27, 28, 41, 53, 55, 64, 67, 75, 
80, 81, 82, 94, 98, 99, 100, 105, 
112ff, 150 
candidate population norms, 80, 
case studies, 150ff 
correlations, 100ff 
factor analyses, 107-10, 111 
Intelligence Grading (1.С.), see 
main entry 
аа. candidates, 


Performance grading, 159, 160, 
161, 162 


performance test battery, 49, 70, 
psychological conference, 6 
written test battery, 113 
grading from, 113, 159, 160, 161, 
162 


X-sample, 103, 105, 108 
Percentile norms, or percentiles, 
general population, 51, 83, 96 
Perception, 21, 52, 53, 151 
Performance rating (MacNeill), 6 
Performance test interview, or situa- 
tion, 1, 6-7 
' Performance tests, 5 and passim 
Perseveration, 65 
Persistence, 151 


Personnel Selection in the British 
em 1944— Officers (film), 7, 


Personality pointers, 7, 156 

Personality variables, 63, 106, 112 

Point-scales, 81ff, 86, 96-7, 113-4 

Points for further investigation, 22-4, 
38-9, 54, 67, 78, 112 


Pooled data from variant test forms: 
Progressive Matrices, 101, 106, 149 
Reasoning, 106, 108 

*Power', in intelligence test perfor- 

mance, 130 

Premature ageing, 133 

Presence of mind, 51 

Presidents' трон (Pemberley), 152- 

Primary factors, 112 

Foma Training Centres and Wings, 

12 


‘Progression’ in Matrices, 108 
Progressive Matrices (1938 series, or 
general references), 5, 80, 81, 94, 
97, 99, 107, 108, 110, 112, 113, 
123, 135, 158, 160, 162 
EDE with 1943 series (q.v.), 
10 
standards for French-speaking 
subjects, 149 
timed and untimed administration, 
124ff, 131, 132, 133, 
Progressive Matrices (unpublished 
1943 series), 5, 95, 97, 104, 107, 
108, 110 
Progressive Matrices (1947 (Child- 
ren’s) series), 94, 165 
Projective techniques, 2, 152, 153, 
155, 156 


Prompting, 4, 12-13, 30, 33-4, 35, 
3 


Protestation, 21 

Prudence, 50 

Psychiatrist, 7, 131, 132, 133 

Payoh na reports (Pemberley), 


Psychologists’ reports (Pemberley), 
153, 158 
Psychopathic trend, 22 


Questionnaire data, 130 
Radio operator, 94 


Rapaport, D., 2, 8, 17, 18, 19, 20, 22, 
37, 141, 165 


172 


Каррог!, 119 А 
Raven, J. С., 94, 123, 125, 16 
Re-allocation, 83, 86, 123ff, 136 
Reasoning (S.P. Test 45), 5, 80, 81, 
94, 97, 99, 107, 108, 109, 110, 
111, 113, 148, 149, 158 
Translated versions, 5, 80, 99, 108, 
1481 
French version, 80, 99, 108, 160 
Polish version, 80, 99, 161 
Reassurance, 66 
Regression (age), 49-50 
Regression equations, 113 
Бега Commission Appeal Boards, 


Rehabilitation, 66 
Rejection, 19 
Reliability, 23, 103 
of LG. conversion, 112ff 
of W.O.S.B. techniques, 5 
Research and Training Centre, 
W.O.S.B.s (R.T.C.), 5, 83, 94, 
123, 126, 135 
Restricted tests, 95, 98 
Re-test, 38 
Rigidity, 36, 63, 154 
Rorschach, 19 
colour shock, 27 
D, 37 
Rorschach, 
dd, 52 
inquiry, 22 
limits testing, 4 
rejection, 19 
Z (Beck's), 51 
Rosenzweig, 5., 53 
Rotation of axes, 110, 112, 142 


Saharov, L., 8-9 

Sampling conditions, 107, 126 
Scheerer, M., 41, 46, 50, 52, 163 
Е ог schizophrenic trend, 19, 


3 
Scoring sheets, 11, 57, 60 
173 


INDEX 


Selection Grade (S.G.), 81, 113 
Self-description, 155 
Semeonoff, B., 9, 14, 19, 28, 41, 70, 
81, 82, 112, 150, 159, 165, 166 
Semeonoff-Vigotsky test, 81, 49, 51, 
81, 82, 96, 100, 103, 106, 109, 
110, 111, 112, 114 
abandonment of near-correct solu- 
tion, 22 
accidental turning, 12, 21 
adjourned session, 151 
alternative methods of administra- 
tion, 23 
basic score, 14 
bizarre responses, 25 
blocking, 12, 151 
*building', 19 
chance success, 16 
clues, 10, 11-12, 151, 159 
colour relationships, 25 
colours, 9, 19, 25 
compared with T-H, 27, 33, 36, 37, 
39, 100, 109 
comparison with Hanfmann-Kasa- 
nin material, 
concrete reference, 15 
—À alternatives, 12, 17-18, 


descriptions of performance, 151, 
155, 159, 160, 161 

dimensions, 9, 20, 25-6 

*distribution', solution by, 16, 19, 


3,25 
duration of test, 16-17 
emotional reaction, 21 
explanation of solution, 13-14, 
152, 155, 159, 160 
failure, 19 
forced clues, 12, 21, 151 
geometrical properties of blocks, 
17, 19 
ide of solution, as score com- 
ponent, 13 
inadequate solutions, 18ff 
interpretation, 18ff 
of colour-pairings, 25 


INDEX 


Semeonoff-Vigotsky test, 
inter-relationships of groups, 15 
irregularities, minute, 20 
‘isolating’ blocks, 13 
logical approach, 21, 151 
matching of blocks, 20, 155 
of sizes, 160 

misplaced blocks, 13 

names (nonsense syllables) applied 
to groups, 19 

parallel form of test, 23 

perceptual solution, 16, 159 

phenomenal size, 20, 26 

points for further investigation, 
22-4 


random manipulation, 21 
reliability, 23 
restandardization by Laird, 81, 82 
score, formula for calculating, 16 
specifications, 24-6 
spontaneous comment, 22 
symbolism, 19 
‘tea-cloth’, 9, 10, 20 
texture of blocks, 20 
verbalization, 23, 151, 154 
volume as criterion, 15, 155 
weight as criterion, 15 
Services test standardizations, 80 
Set, 52 
Shenley Hospital, St. Alban’s,39 
Shipley, W. C., 95, 166 
Shortened Wechsler Verbal Scale, 81, 
97, 100, 111, 1191 
Nt content, elimination of, 
bw тарт; (sub-test), 
- correlations, 103-4, 119 
factor content, 111, 119 
partial credits, 121 
relation to Wechsler-Bellevue (q.v.), 
119, 121 
Similarities (sub-test), 119, 120 
Vocabulary (sub-test), 119, 120 
weighted scores, 121 


119, 


Simple structure, 107 
Skewing, effects of, 82, 91, 97, 126, 


Slater, P., 66, 164 

Social background, 154 

Social participation, 111, 112 

Sociometric assessment, 158 

Space perception factor, 109, 110 
tests, 70, 96, 106, 108 

Spearman, C. S., 6 х 

Spearman Piona prophecy formula, 


Special abilities, 98, 108, 112 
Speed, of test performance, 58, 76, 
95, 123, 125, 154 
ами (S.P. Test 4), 96, 97, 100, 111, 
43 


Standard (Equivalent) Scores, 51, 
828, 94, 104, 119, 124, 126 
assigned means and standard devi- 
ations, 82, 83ff, 96, 97, 
at Pemberley, 82, 84-6, 96, 113, 
149, 150, 154, 158, 159, 160 
at W.O.S.B.s., 82, 83, 86, 87-91, 
97, 119, 154 
at W.O.S.B.s. (Officers) (Re-alloca- 
tion), 83, 123, 124, 125, 126, 
127-30 
Mill Hill Vocabulary, 89, 125, 
129 
Mitchell Vocabulary, 89, 140 
Shortened Wechsler, and sub-tests, 
8, 
Summed Equivalent Score con- 
version, 86, 91, 96-7, 112ff, 130 
Standardized instructions, 56, 57, 
121, 124 
Stanford-Binet testing, 120 
Staying-power, 124 
Stop watch, 57, 58 
Stub of distribution, 126 
Sub-factors (Vernon), 106 
Suggestion, 63 D 


174 


Summary of norms quoted, 96-7 
of Жа for which norms available, 
9 


Summation of grades, 81, 113 
Superficiality, 38 

Superior ability, 97 

Symbolism, 19 

Synaesthesia, 19 

Synonyms, see Mill Hill Vocabulary 


Teachers’ ratings, 6 
Temperament trait ratings, 6 
Tension, 151 
Terman-Merrill scale, 5 
Test instructions, 2 
Test sophistication, 66 
Texture, 20, 36, 155 
Thematic Apperception 
S (T.A.T.), 65, 155 
sample response, 156 
Therapeutic closure, 33 
Transfer, 27, 40 
ТА ресі, 60, 70, 74, 76, 77, 


Test 


Trist, E. L., 3, 4, 6, 27, 37, 41, 53, 166 
Trist, Virginia, 27, 166 
Trist-Hargreaves test, 6, 27ff, 81, 82, 
96, 100, 103, 106, 109, 111, 112, 
114 і 
blocking, 38, 40 
‘building’, 36 
Colour combinations, 29, 31 
colour relationships, 36, 27 
colour values, 28, 37 
column, arrangement in, 38 
comparison with CHS, 65 
comparison with S-V, 27, 33, 36, 
37, 39, 100, 109 
defensible alternatives, 37 
descriptions of performance, 37, 
distribution, solution by, 37 
earlier draft of test, 28, 31 
“edge colour, 28, 30 


INDEX 


Trist-Hargreaves test, 
emotional reaction, 38 
group administration (Lovell), 39 
inner (edge) colour, 34, 155 
‘integrity’, 29 
irregularity in material, 36 
mixing of pieces, 30 
order of items, 28 
over-determination of solution, 32 
pairing principle, 32 
Pairs items 7 and 9, 33, 35 
parallel forms of test, 39-40 
partially correct solution, 33 
penalties, 33, 38 
poiat for further investigation, 
sample piece, 31-2 
specifications, 28-9 
systematic layout, 38 
texture of pieces, 36, 155 
underside of pieces, 29, 36 
unsatisfactory solutions, 36-7 


Trist-Misselbrook-Kohs, 3, 4, 41, 
80, 81, 82, 95, 96, 97, 105, 106, 
10, 109, 110, 113, 115, 149, 158, 
16 


accidental error, 49 

additional series, 46-7 

age-norms, 94, 97 

analytic approach, 46, 52, 154 

black and white version, 49, 54 

‘boards’, 42 

comparison with CHS, 49, 51, 66 

concrete approach, 46, 50, 52 

confirmatory test, use as, 100, 119 

demonstration, 42, 47 

descriptions of performance, 151, 
4 


15 
Designs:Al, 154; B5 48; СІ 52; 
C2 48, 51, 52, 154; C3 52; C4 
52; DI 51, 52; D2 50, 51, 52; 
D3 50, 52; D4 50 
diagonal construction, 43 
direction of movement, 53 
figure-ground relationships, 43, 46 


175 


INDEX 


Trist-Misselbrook, Kohs, 

forcing into position, 151 

‘frames’, 42, 46, 52, 151 

handling of blocks, 50-1 

headroom, lack of, 51, 53, 54 

interchangeability of blocks, 50 

learning, 41, 51 

mirror-halves, 43 

mixing blocks, 51 

Naval recruit norms, 83 

outside frame, construction of 
design, 51 

points for further investigation, 54 

principles of successive series, 43ff 

reduction in scale, 151 

Repeat, 42, 46, 48, 51-2, 54, 80, 96, 

107, 113, 151, 159, 162 

decrease in raw score, 52, 159 
шше in factor structure, 


slowing down on easier prob- 
lems, 52 
rotation, 43, 51 
ane board and problems, 42, 43, 


‘spoiled’ designs, 49 

time-limits, 42, 47, 48, 53, 54, 154 

total-matching approach, 46, 50, 
51 қ 


upside-down construction, 46 
T-scores, 91, 92-3, 94, 97 
Unauthorized material, 20, 26 
University students, 91, 97 
USS. services, 53 


Variance, 104, 105, 109, 111, 142 

v:ed (factor), 107, 109, 142, 160 

Verbal intelligence, 130, 131 

Verbal Intelligence Test (У.І.Т.- 
S.P. Test 15), 95, 97, 100, 110 

Verbal test material, 4, 120, 135, 141 

Verbalization in a performance test, 
23, 152, 154 


р 


Vernon, Р. Е:, 99, 106, 123, 135, 142, 
166 


Vigotsky, L., 9, 28 
test, 4, 6, 8 (see also Semeonoff- 
Vigotsky) 
Visual memory, 99 : 
Vocabulary tests, 89, 95, 123, 136 
(see also Mill Hill, Mitchell, 
Shortened Wechsler) 


Walshaw, J. B., 123, 125, 165 
War Office Selection ^ Boards 
(W.O.S.B.s), 1, 5, 7, 83, 113, 119, 
120, 121, 124, 126 
(Officers) (Re-allocation рго- 
cedure), 123ff, 136 
standard test battery, 83, 94, 104, 
119, 141, 142 
Wechsler Adult Intelligence Scale, 
121 i 


Wechsler-Bellevue Intelligence Scale, 
4, 119, 120, 121 * 
Arithmetical Reasoning, 120 
Coding, 94 
Information (sub-test), 120 
Memory Span (sub-test), 120 
Performance scale, 120 
Verbal scale, 120 
weighted scores, 121 
(see also Shortened Wechsler) 
Wechsler, D., 136, 166 
Weigl, E., 6, 27, 166. 
Word Association (projective tech- 
nique), 155 
Word-play, 149 
one quieres tests, 109, 110, 
5 


Wyatt, E., 65, 166 


z-conversion (Fisher’s);'106 
Zero intelligence, 86 М“ 


